jump to navigation

Costing Concatenated Indexes With Range Scan Predicates Part II (Coming Back To Life) July 27, 2022

Posted by Richard Foote in Automatic Indexing, CBO, Column Statistics, Concatenated Indexes, Explain Plan For Index, Full Table Scans, Index Access Path, Index Column Order, Index Column Reorder, Index Internals, Index statistics, Leaf Blocks, Non-Equality Predicates, Oracle, Oracle Blog, Oracle Cost Based Optimizer, Oracle General, Oracle Index Seminar, Oracle Indexes, Oracle Statistics, Performance Tuning, Richard Foote Training.
add a comment

In my previous Part I post, I discussed how the CBO basically stops the index leaf block access calculations after a non-equality predicate. This means that for an index with the leading indexed column being accessed via an unselective non-equality predicate, a large percentage of the index’s leaf blocks might need to be scanned, making the index access path unviable.

In the example in Part I, an index on the ID, CODE columns was too expensive due to the unselective range-scan predicate based on the leading ID column.

To provide the CBO a potentially much more efficient access path, we need an index with the more selective CODE predicate to be the leading column:

SQL> CREATE INDEX radiohead_code_id_i ON radiohead(code, id);

Index created.

SQL> SELECT index_name, blevel, leaf_blocks, clustering_factor

FROM user_indexes WHERE index_name = 'RADIOHEAD_CODE_ID_I';

INDEX_NAME                        BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR
----------------------------- ---------- ----------- -----------------
RADIOHEAD_CODE_ID_I                    1         265             98619

If we now re-run the previous query:

SQL> SELECT * FROM radiohead WHERE id BETWEEN 1000 AND 5000 AND CODE = 140;

Execution Plan

-----------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                     |     4 |    72 |     6   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| RADIOHEAD           |     4 |    72 |     6   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | RADIOHEAD_CODE_ID_I |     4 |       |     2   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        806  bytes sent via SQL*Net to client
        608  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          4  rows processed

We notice the CBO is now using this new index, as the costs for this index-based plan have dropped significantly, down to just 6 (from the previous 116). This overall cost of 6 is lower than the cost of 105 for the Full Table Scan and hence the reason why this index-based plan is now chosen by the CBO.

This is all due entirely to the significant drop in costs in accessing the index itself, now just 2 (from the previous 112).

Importantly, these much lower costs are accurate as we can tell via the reduced number of consistent reads, now just 7 (from 114 from the previous index-based plan).

If we now look at the associated costings:

Effective Index Selectivity = CODE selectivity x ID selectivity

= (1/10000) x ((5000-1000)/(10000-1) + 2 x (1/10000))

= 0.0001 x ((4000/9999) + 0.0002)

= 0.0001 x 0.40024)

= 0.000040024

Effective Table Selectivity = same as Index Selectivity

= 0.000040024

 

The effective index selectivity of 0.000040024 is now much lower than the previous (0.40024), as the CBO can now consider the product of the selectivities of both columns).

If we now plug this improved effective index selectivity into the index path costing calculations:

Index IO Cost = blevel +

ceil(effective index selectivity x leaf_blocks) +

ceil(effective table selectivity x clustering_factor)

 

Index IO Cost = 1  +  ceil(0.000040024 x 265) + ceil(0.000040024 x 99034)

= 1 + 1  + 4

= 2 + 4

= 6

Index Access Cost  = IO Costs + CPU Costs (in this plan, 0% of total costs and so unchanged from the IO costs)

= 2 + 4

= 6

We can see how the respective 2 and 6 improved CBO index costings are derived.

So again, it’s important to note that Automatic Indexing is doing entirely the correct thing with these examples, when it creates an index with the equality based predicate columns as the leading columns of the index…

Automatic Indexes: Automatically Rebuild Unusable Indexes Part IV (“Nothing Has Changed”) May 31, 2022

Posted by Richard Foote in 19c, 19c New Features, 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Column Order, Index Internals, Local Indexes, Mixing Auto and Manual Indexes, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle Indexing Internals Webinar, Oracle19c, Unusable Indexes.
1 comment so far

In a previous post, I discussed how Automatic Indexing (AI) does not automatically rebuild a manually built index that is in an Unusable state (but will rebuild an Unusable automatically created index).

The demo I used was a simple one, based on manually created indexes referencing a non-partitioned table.

In this post, I’m going to use a demo based on manually created indexes referencing a partitioned table.

I’ll start by creating a rather basic range-based partitioned table, using the RELEASE_DATE column to partition the data by year:

SQL> CREATE TABLE big_bowie (id number, album_id number, country_id number, release_date date,
total_sales number) PARTITION BY RANGE (release_date)
(PARTITION ALBUMS_2014 VALUES LESS THAN (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2015 VALUES LESS THAN (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2016 VALUES LESS THAN (TO_DATE('01-JAN-2017', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2017 VALUES LESS THAN (TO_DATE('01-JAN-2018', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2018 VALUES LESS THAN (TO_DATE('01-JAN-2019', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2019 VALUES LESS THAN (TO_DATE('01-JAN-2020', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2020 VALUES LESS THAN (TO_DATE('01-JAN-2021', 'DD-MON-YYYY')),
 PARTITION ALBUMS_2021 VALUES LESS THAN (MAXVALUE));

Table created.

SQL> INSERT INTO big_bowie SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800),
ceil(dbms_random.value(1,500000)) FROM dual CONNECT BY LEVEL <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_BOWIE');

PL/SQL procedure successfully completed.

I’ll next manually create a couple indexes; a non-partitioned index based on just the ALBUM_ID column and a prefixed locally partitioned index, based on the columns RELEASE_DATE, TOTAL_SALES:

 

SQL> create index album_id_i on big_bowie(album_id);

Index created.

SQL> create index release_date_total_sales_i on big_bowie(release_date, total_sales) local;

Index created.

 

If we now re-organise just partition ALBUMS_2017 (without using the ONLINE clause):

SQL> alter table big_bowie move partition albums_2017;

Table altered.

This results in the non-partitioned index and the ALBUMS_2017 local index partition becoming Unusable:

SQL> select index_name, status from user_indexes where table_name='BIG_BOWIE';

INDEX_NAME                     STATUS
------------------------------ --------
ALBUM_ID_I                     UNUSABLE
RELEASE_DATE_TOTAL_SALES_I     N/A

SQL> select index_name, partition_name, status from user_ind_partitions
     where index_name='RELEASE_DATE_TOTAL_SALES_I';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2014          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2015          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2016          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2017          UNUSABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2018          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2019          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2020          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2021          USABLE

Let’s now run a number of queries a number of times. The first series is based on a predicate on just the ALBUM_ID column, such as:

SQL> select * from big_bowie where album_id=42;

2000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1510748290

-------------------------------------------------------------------------------------------------
| Id  | Operation           | Name      | Rows | Bytes | Cost (%CPU) | Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           | 2000 | 52000 |    7959 (2) | 00:00:01 |       |       |
|   1 | PARTITION RANGE ALL |           | 2000 | 52000 |    7959 (2) | 00:00:01 |     1 |     8 |
| * 2 |  TABLE ACCESS FULL  | BIG_BOWIE | 2000 | 52000 |    7959 (2) | 00:00:01 |     1 |     8 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("ALBUM_ID"=42)
  - filter("ALBUM_ID"=42)

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      48593 consistent gets
      42881 physical reads
          0 redo size
      44289 bytes sent via SQL*Net to client
         52 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
       2000 rows processed

We’ll also run a series of queries based on both the RELEASE_DATE column using dates from the unusable index partition and the TOTAL_SALES column, such as:

SQL> select * from big_bowie where release_date='01-JUN-2017' and total_sales=42;

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3245457041

----------------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows | Bytes | Cost (%CPU) | Time     | Pstart| Pstop |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    1 |    26 |     986 (2) | 00:00:01 |       |       |
|   1 | PARTITION RANGE SINGLE |           |    1 |    26 |     986 (2) | 00:00:01 |     4 |     4 |
| * 2 |  TABLE ACCESS FULL     | BIG_BOWIE |    1 |    26 |     986 (2) | 00:00:01 |     4 |     4 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - storage("TOTAL_SALES"=42 AND "RELEASE_DATE"=TO_DATE(' 2017-06-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))
   - filter("TOTAL_SALES"=42 AND "RELEASE_DATE"=TO_DATE(' 2017-06-01 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
       5573 consistent gets
          0 physical reads
          0 redo size
        676 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          0 rows processed

Without a valid/usable index, the CBO currently has no choice but to use a Full Table Scan on the first query, and a Full Partition Scan on the partition with the unusable local index.

So what does AI make of things? Does it rebuild the unusable manually created indexes so the associated indexes can be used to improve these queries?

If we wait until the next AI task completes and check out the indexes on the table:

SQL> select index_name, status, partitioned from user_indexes where table_name='BIG_BOWIE';

INDEX_NAME                     STATUS   PAR
------------------------------ -------- ---
RELEASE_DATE_TOTAL_SALES_I     N/A      YES
ALBUM_ID_I                     UNUSABLE NO
SYS_AI_aw2825ffpus5s           VALID    NO
SYS_AI_2hf33fpvnqztw           VALID    NO

SQL> select index_name, partition_name, status from user_ind_partitions
     where index_name='RELEASE_DATE_TOTAL_SALES_I';

INDEX_NAME                     PARTITION_NAME       STATUS
------------------------------ -------------------- --------
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2014          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2015          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2016          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2017          UNUSABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2018          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2019          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2020          USABLE
RELEASE_DATE_TOTAL_SALES_I     ALBUMS_2021          USABLE

We notice that AI has created two new non-partitioned automatic indexes, while both the manually created indexes remain in the same unusable state. If we look at the columns associated with these new automatic indexes:

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='BIG_BOWIE';

INDEX_NAME                     COLUMN_NAME          COLUMN_POSITION
------------------------------ -------------------- ---------------
ALBUM_ID_I                     ALBUM_ID                           1
RELEASE_DATE_TOTAL_SALES_I     RELEASE_DATE                       1
RELEASE_DATE_TOTAL_SALES_I     TOTAL_SALES                        2
SYS_AI_aw2825ffpus5s           ALBUM_ID                           1
SYS_AI_aw2825ffpus5s           RELEASE_DATE                       2
SYS_AI_2hf33fpvnqztw           TOTAL_SALES                        1
SYS_AI_2hf33fpvnqztw           RELEASE_DATE                       2

As we can see, AI has logically replaced both unusable indexes.

The manual index based on ALBUM_ID has been replaced with an inferior index based on the ALBUM_ID, RELEASE_DATE columns. Inferior in that the automatic index is both redundant (if only the manual index on ALBUM_ID were rebuilt) and in that it has the logically unnecessary RELEASE_DATE column to inflate the size of the index.

The manual index based on the RELEASE_DATE, TOTAL_SALES columns has been replaced with a redundant automatic index based on the reversed TOTAL_SALES, RELEASE_DATE columns.

Now, AI has indeed automatically addressed the current FTS performance issues associated with these queries by creating these indexes, but a better remedy would have been to rebuild the unusable manual indexes and hence negate the need for these redundant automatic indexes.

But currently (including with version 21.3), AI will NOT rebuild unusable manually created indexes, no matter the scenario, and will instead create additional automatic indexes if it’s viable for it to do so.

A reason why Oracle at times recommends dropping all current manually created secondary indexes before implementing AI (although of course this comes with a range of obvious issues and concerns).

If these manually created indexes didn’t exist, I’ll leave it as an exercise to the discernable reader on what automatic indexes would have been created…

As always, this restriction may change in future releases…

Announcement: Registration Links For Upcoming Webinars Now Open (“Join The Gang”) May 25, 2022

Posted by Richard Foote in 18c New Features, 19c New Features, 21c New Features, Index Internals, Index Internals Seminar, Indexing Tricks, Oracle 21c, Oracle General, Oracle Index Seminar, Oracle Indexing Internals Webinar, Oracle Performance Diagnostics and Tuning Seminar, Oracle Performance Diagnostics and Tuning Webinar, Oracle19c, Performance Tuning, Performance Tuning Seminar, Performance Tuning Webinar, Richard Foote Consulting, Richard Foote Seminars, Richard Foote Training, Richard Presentations.
add a comment

The registration links for my upcoming webinars running in August are now open!!!

The price of each webinar is $1,600 AUD. There is a special price of $2,750 AUD if you wish to attend both webinars (just use the Special Combo Price button).

(Note: Do NOT use the links if you’re an Australian resident. Please contact me at richard@richardfooteconsulting.com for additional payment info and tax invoice that includes additional GST).

Just click the below “Buy Now” buttons to book your place for these unique, highly acclaimed Oracle training events (see some of my testimonials for feedback by previous attendees to these training events):

 

Oracle Indexing Internals Webinar: 8-12 August 2022 (between 09:00 GMT and 13:00 GMT daily) – $1,600 AUD: SOLD OUT!!

Oracle Performance Diagnostics and Tuning Webinar: 22-25 August 2022 (between 09:00 GMT and 13:00 GMT daily) – $1,600 AUD: SOLD OUT!!

Special Combo Price for both August 2022 Webinars$2,750 AUD: SOLD OUT!!

 

The links allow you to book a place using either PayPal or a credit card. If you wish to pay via a different method or have any questions at all regarding these events, please contact me at richard@richardfooteconsulting.com.

As I mentioned previously, for those of you on my official waiting list, I will reserve a place for you for a limited time.

As this will probably be the last time I will run these events, remaining places are likely to go quickly. So please book your place ASAP to avoid disappointment…

 

Read below a brief synopsis of each webinar:

Oracle Indexing Internals

This is a must attend webinar of benefit to not only DBAs, but also to Developers, Solution Architects and anyone else interested in designing, developing or maintaining high performance Oracle-based applications. It’s a fun, but intense, content rich webinar that is suitable for people of all experiences (from beginners to seasoned Oracle experts).

Indexes are fundamental to every Oracle database and are crucial for optimal performance. However, there’s an incredible amount of misconception, misunderstanding and pure myth regarding how Oracle indexes function and should be maintained. Many applications and databases are suboptimal and run inefficiently primarily because an inappropriate indexing strategy has been implemented.

This webinar examines most available Oracle index structures/options and discusses in considerable detail how indexes function, how/when they should be used and how they should be maintained. A key component of the webinar is how indexes are costed and evaluated by the Cost Based Optimizer (CBO) and how appropriate data management practices are vital for an effective indexing strategy. It also covers many useful tips and strategies to maximise the benefits of indexes on application/database performance and scalability, as well as in maximising Oracle database investments. Much of the material is exclusive to this webinar and is not generally available in Oracle documentation or in Oracle University courses.

For full details, see: https://richardfooteconsulting.com/indexing-seminar/

 

Oracle Performance Diagnostics and Tuning

This is a must attend webinar aimed at Oracle professionals (both DBAs and Developers) who are interested in Performance Tuning.  The webinar details how to maximise the performance of both Oracle databases and associated applications and how to diagnose and address any performance issues as quickly and effectively as possible.

When an application suddenly runs “slow” or when people start complaining about the “poor performance” of the database, there’s often some uncertainty in how to most quickly and most accurately determine the “root” cause of any such slowdown and effectively address any associated issues. In this seminar, we explore a Tuning Methodology that helps Oracle professionals to both quickly and reliably determine the actual causes of performance issues and so ensure the effectiveness of any applied resolutions.

Looking at a number of real world scenarios and numerous actual examples and test cases, this webinar will show participants how to confidently and reliably diagnose performance issues. The webinar explores in much detail the various diagnostics tools and reports available in Oracle to assist in determining any database performance issue and importantly WHEN and HOW to effectively use each approach. Additionally, participants are also invited to share their own database/SQL reports, where we can apply the principles learnt in diagnosing the performance of their actual databases/applications.

One of the more common reasons for poor Oracle performance is inefficient or poorly running SQL. This seminar explores in much detail how SQL is executed within the Oracle database, the various issues and related concepts important in understanding why SQL might be inefficient and the many capabilities and features Oracle has in helping to both resolve SQL performance issues and to maintain the stability and reliability of SQL execution.

It’s a fun, but intense, content rich webinar that is suitable for people of all experiences (from beginners to seasoned Oracle experts).

For full details, see: https://richardfooteconsulting.com/performance-tuning-seminar/

 

If you have any questions about these events, please contact me at richard@richardfooteconsulting.com

 

Announcement: Dates Confirmed For Upcoming Webinars (“Here Today, Gone Tomorrow”) May 19, 2022

Posted by Richard Foote in 19c, 19c New Features, 21c New Features, Index Internals, Index Internals Seminar, Indexing Myth, Oracle, Oracle 21c, Oracle General, Oracle Index Seminar, Oracle Indexes, Oracle Indexing Internals Webinar, Oracle Performance Diagnostics and Tuning Webinar, Oracle19c, Performance Tuning, Performance Tuning Webinar, Richard Foote Seminars, Webinar.
add a comment

As promised last week, I have now finalised the dates for my upcoming webinars.

They will be run as follows (UPDATED):

Oracle Indexing Internals Webinar: 8-12 August 2022 (between 09:00 GMT and 13:00 GMT daily): SOLD OUT!!

Oracle Performance Diagnostics and Tuning Webinar: 22-25 August 2022 (between 09:00 GMT and 13:00 GMT daily): SOLD OUT!!

Special Combo Price for both August 2022 Webinars“: SOLD OUT!!

I’ll detail costings and how to register for these events in the coming days.

 

There is already quite a waiting list for both of these webinars and so I anticipate available places will likely go quickly. Sorry to all those who have been waiting for so long and thank you for your patience. Please note for those on the waiting list, I already have places reserved for you.

It’s highly likely these will be the last time I’ll ever run these highly acclaimed training events (yes, I’m getting old)…

So don’t miss this unique opportunity to learn important skills in how to improve the performance and scalability of both your Oracle based applications and backend Oracle databases, in the comfort of your own home or office.

Read below a brief synopsis of each webinar:

Oracle Indexing Internals

This is a must attend webinar of benefit to not only DBAs, but also to Developers, Solution Architects and anyone else interested in designing, developing or maintaining high performance Oracle-based applications. It’s a fun, but intense, content rich webinar that is suitable for people of all experiences (from beginners to seasoned Oracle experts).

Indexes are fundamental to every Oracle database and are crucial for optimal performance. However, there’s an incredible amount of misconception, misunderstanding and pure myth regarding how Oracle indexes function and should be maintained. Many applications and databases are suboptimal and run inefficiently primarily because an inappropriate indexing strategy has been implemented.

This seminar examines most available Oracle index structures/options and discusses in considerable detail how indexes function, how/when they should be used and how they should be maintained. A key component of the seminar is how indexes are costed and evaluated by the Cost Based Optimizer (CBO) and how appropriate data management practices are vital for an effective indexing strategy. It also covers many useful tips and strategies to maximise the benefits of indexes on application/database performance and scalability, as well as in maximising Oracle database investments. Much of the material is exclusive to this seminar and is not generally available in Oracle documentation or in Oracle University courses.

For full details, see: https://richardfooteconsulting.com/indexing-seminar/

 

Oracle Performance Diagnostics and Tuning

This is a must attend webinar aimed at Oracle professionals (both DBAs and Developers) who are interested in Performance Tuning.  The webinar details how to maximise the performance of both Oracle databases and associated applications and how to diagnose and address any performance issues as quickly and effectively as possible.

When an application suddenly runs “slow” or when people start complaining about the “poor performance” of the database, there’s often some uncertainty in how to most quickly and most accurately determine the “root” cause of any such slowdown and effectively address any associated issues. In this seminar, we explore a Tuning Methodology that helps Oracle professionals to both quickly and reliably determine the actual causes of performance issues and so ensure the effectiveness of any applied resolutions.

Looking at a number of real world scenarios and numerous actual examples and test cases, this webinar will show participants how to confidently and reliably diagnose performance issues. The webinar explores in much detail the various diagnostics tools and reports available in Oracle to assist in determining any database performance issue and importantly WHEN and HOW to effectively use each approach. Additionally, participants are also invited to share their own database/SQL reports, where we can apply the principles learnt in diagnosing the performance of their actual databases/applications.

One of the more common reasons for poor Oracle performance is inefficient or poorly running SQL. This seminar explores in much detail how SQL is executed within the Oracle database, the various issues and related concepts important in understanding why SQL might be inefficient and the many capabilities and features Oracle has in helping to both resolve SQL performance issues and to maintain the stability and reliability of SQL execution.

It’s a fun, but intense, content rich webinar that is suitable for people of all experiences (from beginners to seasoned Oracle experts).

For full details, see: https://richardfooteconsulting.com/performance-tuning-seminar/

 

Keep an eye out in the coming days on costings and how to register for these events.

If you have any questions about these events, please contact me at richard@richardfooteconsulting.com

Automatic Indexes: Scenarios Where Automatic Indexes NOT Created Part II (“Ragazzo Solo, Ragazza Sola” April 27, 2022

Posted by Richard Foote in 19c, 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Constraints, Exadata, Foreign Keys, Full Table Scans, Index Internals, Oracle, Oracle 21c, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle General, Oracle Indexes, Oracle19c, Performance Tuning.
1 comment so far

In my last post, I discussed how Automatic Indexing doesn’t create an automatic index in the scenario where the minimum or maximum of a column is required.

Another scenario when an automatic index is not created is when we hit issues associated with a missing index on a Foreign Key (FK) constraint.

As I’ve discussed many times previously, if you delete a parent record without an index on the dependant FK constraints, you hit a number of issues including having to perform a (potentially expensive and problematic) Full Table Scan (FTS) on the child tables and the associated locking problems.

To illustrate, I’ll first create a small parent table:

SQL> create table daddy (id number constraint daddy_pk primary key , name varchar2(42));

Table created.

SQL> insert into daddy select rownum, 'David Bowie '|| rownum from dual connect by level <= 10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'DADDY');

PL/SQL procedure successfully completed.

And then a somewhat larger child table, with no index on the associated foreign key constraint:

SQL> create table kiddy (id number constraint kiddy_pk primary key , code1 number constraint daddy_fk references daddy(id), code2 number, code3 number, name varchar2(42));

Table created.

SQL> insert into kiddy select rownum, mod(rownum,1000)+1000 , mod(rownum, 10000)+1, mod(rownum, 100000)+1, 'Ziggy Stardust '|| rownum from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'KIDDY');

PL/SQL procedure successfully completed.

 

If we delete a number of parent rows, for example:

SQL> delete from daddy where id = 101;

1 row deleted.

Execution Plan
----------------------------------------------------------
Plan hash value: 1477800718

-------------------------------------------------------------------------------
| Id | Operation         | Name     | Rows | Bytes | Cost (%CPU) |   Time     |
-------------------------------------------------------------------------------
|  0 | DELETE STATEMENT  |          |    1 |     4 |       1 (0) |   00:00:01 |
|  1 | DELETE            | DADDY    |      |       |             |            |
|* 2 | INDEX UNIQUE SCAN | DADDY_PK |    1 |     4 |       1 (0) |   00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=101)

Statistics
----------------------------------------------------------
         18 recursive calls
         13 db block gets
     117462 consistent gets
      22292 physical reads
    4645500 redo size
        204 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

We notice that even though we only delete one row from a relatively small table, we perform a large number of consistent gets (117462) due to the necessary FTS on the child table, as Oracle is forced to check the table for any possible FK violations. Without an index on the child CODE1 column, Oracle has no choice but to perform the relatively expensive FTS.

Additionally, if we have an existing transaction of a child table (in Session 1):

SQL> insert into kiddy values (10000001,1042,1042,1042,'Iggy Pop');

1 row created.

And then in another session attempt to delete a parent row (in Session 2):

SQL> delete from daddy where id = 112;

The delete hangs in a locked state due to the child transaction in Session 1. This can lead to further locking issues in other sessions (Session 3):

insert into kiddy values (10000002,1042,1042,1042,'Iggy Pop');

 

The FTS on the child table and these associated locks can all be avoided by having an index on the FK constraint, as the index can then be used to effectively police the constraint during such delete operations.

What does AI do in this scenario?

Currently, nothing.

I’ve been unable to ever get AI to create a usable automatic index in this scenario. In Oracle Database 19c, I’ve not been able to get an AI created at all. In Oracle Database 21c, the best I’ve seen has been a Unusable/Invisible AI:

SQL> select index_name, index_type, auto, constraint_index, visibility, status, num_rows from user_indexes where table_n
ame='KIDDY';

INDEX_NAME                     INDEX_TYPE                  AUT CON VISIBILIT STATUS     NUM_ROWS
------------------------------ --------------------------- --- --- --------- -------- ----------
KIDDY_PK                       NORMAL                      NO  YES VISIBLE   VALID      10000004
SYS_AI_31thttf8v6r35           NORMAL                      YES NO  INVISIBLE UNUSABLE   10000004

SQL> select index_name, column_name, column_position from user_ind_columns where table_name='KIDDY';

INDEX_NAME                     COLUMN_NAME     COLUMN_POSITION
------------------------------ --------------- ---------------
KIDDY_PK                       ID                            1
SYS_AI_31thttf8v6r35           CODE1                         1

So you may need to manually create such an index on the FK constraint to improve performance and eliminate these locking issues:

SQL> create index kiddy_code1_i on kiddy(code1);

Index created.

SQL> delete from daddy where id = 142;

1 row deleted.

Execution Plan
----------------------------------------------------------
Plan hash value: 1477800718

-------------------------------------------------------------------------------
| Id | Operation         | Name     | Rows | Bytes | Cost (%CPU) |   Time     |
-------------------------------------------------------------------------------
|  0 | DELETE STATEMENT  |          |    1 |     4 |       1 (0) |   00:00:01 |
|  1 | DELETE            | DADDY    |      |       |             |            |
|* 2 | INDEX UNIQUE SCAN | DADDY_PK |    1 |     4 |       1 (0) |   00:00:01 |
-------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=142)

Statistics
----------------------------------------------------------
          1 recursive calls
          8 db block gets
          2 consistent gets
          2 physical reads
        132 redo size
        204 bytes sent via SQL*Net to client
         41 bytes received via SQL*Net from client
          1 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

Consistent gets have dropped off massively (down to just 8) as Oracle can now use the index to avoid the FTS search on the child table. The associated locking issues are eliminated as well.

Note: As always, this AI behaviour can always change in the future…

Automatic Indexing: Deferred Invalidations (“The Post War Dream”) April 19, 2022

Posted by Richard Foote in 21c New Features, Automatic Indexing, Autonomous Database, Autonomous Transaction Processing, CBO, Deferred Invalidation, Exadata, Function Based Indexes, Index Access Path, Index Internals, JSON, Oracle, Oracle Blog, Oracle Cloud, Oracle Cost Based Optimizer, Oracle Indexes, Richard's Blog.
1 comment so far

In my previous post on how JSON expressions can now be automatically indexed, I mentioned there was an outstanding issue with the associated CBO execution plan, immediately post the creation of the automatic index:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';

Execution Plan
----------------------------------------------------------
Plan hash value: 832017402

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      | 20000 |   12M |    1524 (1) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON           | 20000 |   12M |    1524 (1) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_ayvj257jd93cv | 8000  |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING
           VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42')

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
     234168 consistent gets
     200279 physical reads
          0 redo size
       1595 bytes sent via SQL*Net to client
        526 bytes received via SQL*Net from client
          3 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
          1 rows processed

If we look at the number of recursive calls, we notice that it remains at 0. If we look at both the number of consistent gets (234168) and physical reads (200279), they both remain very high and identical to that of the previous Full Table Scan plan.

Basically, although autotrace suggests the newly created automatic index is being used, in fact the previous Full Table Scan plan is still being invoked.  (Note: this of course is one of the dangers of the autotrace plan, in that it might not display the actual plan being invoked).

So what’s going on here?

The Oracle Database 21c New Features Guide makes the following point: “an enhancement has been introduced to reduce the overhead of cursor invalidations when a new automatic index is created”.

Oracle 12.2 introduced a new feature in which one can now defer the invalidation of dependent SQL cursors when an index is created or modified. I’ve of course discussed this previously in this 12.2 Index Deferred Invalidation post.

When an automatic index is created in 21c, the current SQL cursors are NOT invalidated (to reduce the overhead of having to potentially reparse of large number of current SQL cursors). However, this means that currently inefficient SQL statements will keep their existing sub-optimal execution plans post the creation of newly created automatic indexes, until the existing SQL cursors aged out.

At which point, the new CBO plan using the automatic index will actually be invoked:

SQL> select * from bowie_json where json_value(bowie_order, '$.PONumber')='42';

Execution Plan
----------------------------------------------------------
Plan hash value: 832017402

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) |  Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |   671 |       4 (0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_JSON           |    1 |   671 |       4 (0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN                    | SYS_AI_ayvj257jd93cv |    1 |       |       3 (0) |  00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("BOWIE_ORDER" /*+ LOB_BY_VALUE */ FORMAT OSON , '$.PONumber' RETURNING
           VARCHAR2(4000) ERROR ON ERROR NULL ON EMPTY)='42')

Statistics
----------------------------------------------------------
          30 recursive calls
           0 db block gets
          46 consistent gets
          11 physical reads
           0 redo size
        1595 bytes sent via SQL*Net to client
         526 bytes received via SQL*Net from client
           3 SQL*Net roundtrips to/from client
           0 sorts (memory)
           0 sorts (disk)
           1 rows processed

So just be aware in Oracle Database 21c that your beautifully created automatic indexes may not actually get used as desired for a period of time…

Oracle 19c Automatic Indexing: Indexing With Stale Statistics Part II (Survive) October 7, 2020

Posted by Richard Foote in 19c, 19c New Features, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Exadata, Full Table Scans, Index Internals, Index statistics, Oracle, Oracle General, Oracle Indexes, Oracle Statistics, Oracle19c, Performance Tuning, Stale Statistics.
1 comment so far

 

 

In my previous post, I discussed how having stale statistics, usually a bad idea, is especially problematic with regard Automatic Indexes as it usually results in viable automatic indexes only being created in an UNUSABLE/INVISIBLE state.

If we were to now to collect the missing statistics:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_STALE');

PL/SQL procedure successfully completed.

SQL> select table_name, num_rows, blocks, last_analyzed from user_tables
where table_name='BOWIE_STALE';

TABLE_NAME             NUM_ROWS     BLOCKS LAST_ANAL
-------------------- ---------- ---------- ---------
BOWIE_STALE            10000000      39677 06-JUL-20

SQL> select column_name, num_distinct, density, histogram, last_analyzed from user_tab_cols
where table_name='BOWIE_STALE';

COLUMN_NAME          NUM_DISTINCT    DENSITY HISTOGRAM       LAST_ANAL
-------------------- ------------ ---------- --------------- ---------
ID                       10000000          0 HYBRID          06-JUL-20
CODE                       971092    .000001 HYBRID          06-JUL-20
NAME                            1 4.9416E-08 FREQUENCY       06-JUL-20

 

If we now repeatedly re-run the problematic query many times:

 

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 65903426

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)|  Time       |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |   10 |   230 |    544 (14)|  00:00:01   |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |   10 |   230 |    544 (14)|  00:00:01   |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39430 consistent gets
      39421 physical reads
          0 redo size
        610 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

 

The CBO is forced to use the FTS as the current Automatic Index is in an UNUSABLE/INVISIBLE state.

If we wait for the next Automatic Indexing reporting period:

 

SQL> select dbms_auto_index.report_last_activity('text', 'ALL', 'ALL' ) report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start              : 06-JUL-2020 05:12:42
Activity end                : 06-JUL-2020 05:13:34
Executions completed        : 1
Executions interrupted      : 0
Executions with fatal error : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates           : 0
Indexes created            : 0
Space used                 : 0 B
Indexes dropped            : 0
SQL statements verified    : 0
SQL statements improved    : 0
SQL plan baselines created : 0
Overall improvement factor : 0x
-------------------------------------------------------------------------------

SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes   : 0
Space used       : 0 B
Unusable indexes : 0
-------------------------------------------------------------------------------

 

We notice that the Automatic Indexing process has nothing to report. Even though the problematic query is repeatedly executed, the SQL is now effectively on a blacklist and is not re-considered by the Automatic Indexing process.

If we look at the index details on the table:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_STALE';

INDEX_NAME             AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
BOWIE_STALE_PK         NO  YES VISIBLE   DISABLED      VALID      10000000       20164             59110
SYS_AI_300kk2unp8tr0   YES NO  INVISIBLE ADVANCED LOW  UNUSABLE   10000000       23058           4147514 

 

So the Automatic Index (SYS_AI_300kk2unp8tr0) is still UNUSABLE and INVISIBLE and can not be used by the CBO.

NOTE: In earlier patches of Oracle Database 19c (I’m using version 19.5.0.0.0 in this demo), I identified some scenarios after stale statistics when indexes were created in but in a VALID/INVISIBLE state, such that they could still not be used by the CBO in general database sessions.

If we simply re-run the same queries again from the time when the dependant object statistics were stale, any SQL is just ignored by the Automatic Indexing process.

As such, if we now subsequently re-run the problematic query again:

SQL> select * from bowie_stale where code=42;

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 65903426

-----------------------------------------------------------------------------------------
| Id | Operation                | Name        | Rows | Bytes | Cost (%CPU)| Time        |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT         |             |   10 |   230 |    544 (14)| 00:00:01    |
|* 1 | TABLE ACCESS STORAGE FULL| BOWIE_STALE |   10 |   230 |    544 (14)| 00:00:01    |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
    filter("CODE"=42)

Note
-----
- automatic DOP: Computed Degree of Parallelism is 1

Statistics
----------------------------------------------------------
          0 recursive calls
          0 db block gets
      39430 consistent gets
      39421 physical reads
          0 redo size
        610 bytes sent via SQL*Net to client
        361 bytes received via SQL*Net from client
          2 SQL*Net roundtrips to/from client
          0 sorts (memory)
          0 sorts (disk)
         10 rows processed

 

Again, the CBO has no choice here with no viable VALID/VISIBLE index present but to perform a FTS, even though its getting the cardinality estimates spot on since statistics gathering.

 

In Part III I’ll discuss how to get this query to finally use the Automatic Index and improve its performance, although if you’re a regular reader of the blog you should already know the solution…

Oracle 19c Automatic Indexing: Poor Data Clustering With Autonomous Databases Part III (Star) August 11, 2020

Posted by Richard Foote in 19c, 19c New Features, Attribute Clustering, Automatic Indexing, Autonomous Data Warehouse, Autonomous Database, Autonomous Transaction Processing, CBO, Clustering Factor, Data Clustering, Exadata, Index Access Path, Index Internals, Index statistics, Oracle, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
2 comments

In Part I we looked at a scenario where an index was deemed to be too inefficient for Automatic Indexing to create a VALID index, because of the poor clustering of data within the table.

In Part II we improved the data clustering but the previous SQLs could still not generate a new Automatic Index because they had effectively been blacklisted.

So how do we get Automatic Indexing to improve the performance of these queries?

Basically, we need to run some new SQL statements to those previously run which have not been blacklisted, that can make the Automatic Indexing process kick in and create the necessary indexes.

For example, if we now run the following SQL statements that have not previously run:

select * from nickcave where code=1;

select * from nickcave where code=2;

select * from nickcave where code=3;

 

And now wait for the next Automatic Indexing process period and look at the following (partial) Automatic Indexing report:

 

REPORT

--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 22-JUN-2020 04:26:31
Activity end                 : 22-JUN-2020 04:27:25
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0

-------------------------------------------------------------------------------
SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------

Index candidates                              : 0
Indexes created (visible / invisible)         : 1 (1 / 0)
Space used (visible / invisible)              : 167.77 MB (167.77 MB / 0 B)
Indexes dropped                               : 0
SQL statements verified                       : 3
SQL statements improved (improvement factor)  : 3 (76x)
SQL plan baselines created                    : 0
Overall improvement factor                    : 76x


INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
------------------------------------------------------------------------
| Owner | Table    | Index                | Key  | Type   | Properties |
------------------------------------------------------------------------
| BOWIE | NICKCAVE | SYS_AI_dh8pumfww3f4r | CODE | B-TREE | NONE       |
------------------------------------------------------------------------

VERIFICATION DETAILS
-------------------------------------------------------------------------------
The performance of the following statements improved:
-------------------------------------------------------------------------------

Parsing Schema Name  : BOWIE
SQL ID               : 5k1wmtu7um5q9
SQL Text             : select * from nickcave where code=1
Improvement Factor   : 76x

Execution Statistics:
-----------------------------

                   Original Plan                   Auto Index Plan
                   ----------------------------  ----------------------------
Elapsed Time (s):  1725103                       106145
CPU Time (s):      1534305                       62314
Buffer Gets:       291835                        779
Optimizer Cost:    9125                          792
Disk Reads:        0                             197
Direct Writes:     0                             0
Rows Processed:    500000                        100000
Executions:        5                             1

 

We can see that an index has indeed now been created on the CODE column because one of the new statements is now deemed to be 76x more efficient thanks to the new index.

If we look at details of this new Automatic Index:

 

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='NICKCAVE';

INDEX_NAME           AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
-------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_dh8pumfww3f4r YES NO  VISIBLE   DISABLED      VALID      10000000       19518             57983

SQL> select index_name, column_name, column_position from user_ind_columns
where table_name='NICKCAVE'
order by index_name, column_position;

INDEX_NAME           COLUMN_NAME          COLUMN_POSITION
-------------------- -------------------- ---------------
SYS_AI_dh8pumfww3f4r CODE                               1

 

We can see that the index is now indeed VALID and VISIBLE with a much improved Clustering Factor at just 57983.

If we now re-run newer SQL statement:

 

SQL> select * from nickcave where code=1;

100000 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100K | 3613K |  792   (2) | 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  100K | 3613K |  792   (2) | 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NICKCAVE             |  100K | 3613K |  792   (2) | 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  100K |       |  205   (4) | 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  100K |       |  205   (4) | 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |           INDEX RANGE SCAN            | SYS_AI_dh8pumfww3f4r |  100K |       |  205   (4) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   8 - access("CODE"=1)

Statistics
----------------------------------------------------------
          12  recursive calls
           0  db block gets
         779  consistent gets
           0  physical reads
         176  redo size
     2363897  bytes sent via SQL*Net to client
       73914  bytes received via SQL*Net from client
        6668  SQL*Net roundtrips to/from client
           2  sorts (memory)
           0  sorts (disk)
      100000  rows processed

 

We notice the SQL statement is now indeed using this new Automatic Index.

If we now re-run our original SQL statement that had been using a FTS execution plan and that we couldn’t make Automatic Indexing create a VALID index because when originally run, the data clustering was too poor within the table:

SQL> select * from nickcave where code=42;

100000 rows selected.

Execution Plan
--------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name                | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |                      |  100K | 3613K |  792   (2) | 00:00:01 |
|   1 |  PX COORDINATOR                       |                      |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)                 | :TQ10001             |  100K | 3613K |  792   (2) | 00:00:01 |
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| NICKCAVE             |  100K | 3613K |  792   (2) | 00:00:01 |
|   4 |     BUFFER SORT                       |                      |       |       |            |          |
|   5 |      PX RECEIVE                       |                      |  100K |       |  205   (4) | 00:00:01 |
|   6 |       PX SEND HASH (BLOCK ADDRESS)    | :TQ10000             |  100K |       |  205   (4) | 00:00:01 |
|   7 |        PX SELECTOR                    |                      |       |       |            |          |
|*  8 |         INDEX RANGE SCAN              | SYS_AI_dh8pumfww3f4r |  100K |       |  205   (4) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

    8 - access("CODE"=42)

Statistics
----------------------------------------------------------
          14  recursive calls
           4  db block gets
         780  consistent gets
         198  physical reads
       15224  redo size
     2363897  bytes sent via SQL*Net to client
       73914  bytes received via SQL*Net from client
        6668  SQL*Net roundtrips to/from client
           2  sorts (memory)
           0  sorts (disk)
      100000  rows processed

 

This query is now also finally using the newly created index, because the CBO now too deems it to be more efficient with an index based execution plan.

The moral of the story. Automatic Indexing may initially deem a potential index to not be efficient enough to be created. However, things may change such as the clustering of table data (or the distribution of data values, etc. etc.) that may make a new index now viable. This though requires a NEW SQL statement to be executed, such that a non-blacklisted SQL can invoke the Automatic Indexing process to create the necessary Automatic Index.

Of course, things may change in the future. Future releases may have the facility to automatically re-cluster the data in tables optimally based on existing workloads and may also have a mechanism to identify that things have sufficient “changed” such that previously “failed” SQL statements from an Automatic Indexing perspective may warrant reevaluation.

This has only been tested up to version Oracle Database 19.5 of the Oracle Autonomous Database environments.

Oracle Database 19c Automatic Indexing: Index Compression (Ghosteen) December 9, 2019

Posted by Richard Foote in 19c, 19c New Features, Advanced Index Compression, Automatic Indexing, AUTO_INDEX_COMPRESSION, Index Column Order, Index Compression, Index Internals.
2 comments

 

 

In my previous post on Automatic Indexing, I discussed how the default index column order (in absence of other factors) is column id, the order in which the columns are defined in the table. In this post, I’ll explore if this changes if index compression is also implemented.

By default, Automatic Indexing does NOT use index compression. However, if you have access to the Advanced Compression option, you have the choice to turn on index compression in the following manner:

SQL> EXEC DBMS_AUTO_INDEX.CONFIGURE('AUTO_INDEX_COMPRESSION','ON');

PL/SQL procedure successfully completed.

Note: the AUTO_INDEX_COMPRESSION parameter is not actually documented, which could be a documentation bug or that Oracle is not yet ready to release this capability. The above will enable Automatic Indexes to be created with Compress Advanced Low, which is the “no-brain” index compression option as it will compress (deduplicate) safely with negligible CPU overheads. However, index column order is still critical to ensure effective compression as we shall see…

We begin by creating a simple table, that has four columns of interest, CODE1, CODE2, CODE3 and STATUS. They are defined in this order within the table, but CODE1 has the most number of distinct values (5000000 distinct values), then CODE2 (1000), then CODE3 (10) and finally STATUS which only has the 1 distinct value.

SQL> create table bowie_compress (id number, code1 number, code2 number, code3 number, status varchar2(42), name varchar2(42));

Table created.

SQL> insert into bowie_compress select rownum, mod(rownum, 5000000)+1, mod(rownum, 1000)+1, mod(rownum, 10)+1, 'COMPLETED’,

'David Bowie' from dual connect by level <= 10000000;

10000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_COMPRESS');

PL/SQL procedure successfully completed.

 

In terms of being the most efficient from a compression perspective, it would be better to have the index defined in STATUS, CODE3, CODE2, CODE1 order, so that the leading columns in the index have the most duplicated values that enable effective deduplication and hence index compression. I’ve discussed the importance of index column for effective index compression a number of times previously. Arguably, it would be better not to actually index the STATUS column at all as with just 1 distinct value, provides no effective filtering benefits.

Having the CODE1 column as the leading column however with so many distinct values would effectively make the index non-compressible (with LOW compression), as the leading column would have too many distinct values to benefit much from compression.

So how does Automatic Indexing handle this scenario? Does it keep the same default index column order or does it alter the index column order to provide better index compression benefits?

Let’s run the following SQL with all four columns in the predicates and see what index Automatic Indexing creates…

 

SQL> select * from bowie_compress where code1=42 and code2=42 and code3=2 and status='COMPLETED';

Execution Plan

-----------------------------------------------------------------------------------------------
| Id  | Operation                    | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                |     1 |    41 |   9998  (5)| 00:00:01 |
|   1 |  PX COORDINATOR              |                |       |       |            |          |
|   2 |   PX SEND QC (RANDOM)        | :TQ10000       |     1 |    41 |   9998  (5)| 00:00:01 |
|   3 |    PX BLOCK ITERATOR         |                |     1 |    41 |   9998  (5)| 00:00:01 |
|*  4 |     TABLE ACCESS STORAGE FULL| BOWIE_COMPRESS |     1 |    41 |   9998  (5)| 00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

4 - storage("CODE1"=42 AND "CODE2"=42 AND "CODE3"=2 AND "STATUS"='COMPLETED')
     filter("CODE1"=42 AND "CODE2"=42 AND "CODE3"=2 AND "STATUS"='COMPLETED')

Statistics
----------------------------------------------------------
        6 recursive calls
        0 db block gets
    63562 consistent gets
        0 physical reads
        0 redo size
     1038 bytes sent via SQL*Net to client
      588 bytes received via SQL*Net from client
        2 SQL*Net roundtrips to/from client
        0 sorts (memory)
        0 sorts (disk)
        2 rows processed

 

If we look at the Automatic Indexing report for the period in which the above SQL was run:

 

SQL> select dbms_auto_index.report_last_activity() report from dual;

REPORT
--------------------------------------------------------------------------------
GENERAL INFORMATION
-------------------------------------------------------------------------------
Activity start               : 18-JUL-2019 00:18:35
Activity end                 : 18-JUL-2019 00:19:58
Executions completed         : 1
Executions interrupted       : 0
Executions with fatal error  : 0
-------------------------------------------------------------------------------

SUMMARY (AUTO INDEXES)
-------------------------------------------------------------------------------
Index candidates                                : 1
Indexes created (visible / invisible)           : 1 (1 / 0)
Space used (visible / invisible)                : 293.6 MB (293.6 MB / 0 B)
Indexes dropped                                 : 0
SQL statements verified                         : 1
SQL statements improved (improvement factor)    : 1 (63563.9x)
SQL plan baselines created                      : 0
Overall improvement factor                      : 63563.9x

-------------------------------------------------------------------------------
SUMMARY (MANUAL INDEXES)
-------------------------------------------------------------------------------
Unused indexes    : 0
Space used        : 0 B
Unusable indexes  : 0

-------------------------------------------------------------------------------
INDEX DETAILS
-------------------------------------------------------------------------------
The following indexes were created:
--------------------------------------------------------------------------------------------------
| Owner | Table          | Index                | Key                      | Type   | Properties |
--------------------------------------------------------------------------------------------------
| BOWIE | BOWIE_COMPRESS | SYS_AI_bkdhrsd29vd4f | CODE1,CODE2,CODE3,STATUS | B-TREE | NONE       |
--------------------------------------------------------------------------------------------------

 

We see that Automatic Index has created the index with all four columns from the SQL predicate in again the default column order as the column order as defined in the table (CODE1, CODE2, CODE3, STATUS). Even though Automatic Index Compression was enabled, it hasn’t considered the column cardinalities in its determination of best index column order.

Automatic Indexing has the tendency to index ALL columns specified in SQL predicates, regardless of whether all such columns provide filtering benefits AND does not consider the best column order from a compression perspective when determining index column order. So definitely room for improvement here methinks.

If we look at the definition and size of the resultant Automatic Index:

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_COMPRESS';

INDEX_NAME                   AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_bkdhrsd29vd4f         YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       35451          10000000

SQL> select index_name, column_name, column_position
from user_ind_columns where table_name='BOWIE_COMPRESS' order by index_name, column_position;

INDEX_NAME                   COLUMN_NAME     COLUMN_POSITION
---------------------------- --------------- ---------------
SYS_AI_bkdhrsd29vd4f         CODE1                         1
SYS_AI_bkdhrsd29vd4f         CODE2                         2
SYS_AI_bkdhrsd29vd4f         CODE3                         3
SYS_AI_bkdhrsd29vd4f         STATUS                        4

 

We note the index has 35451 leaf blocks.

If we were to create the index manully in a more appropriate manner from a compression perspective, with the index columns defined in reverse order and also with another index without the redundant STATUS column:

SQL> create index bowie_compress_best_order_i on bowie_compress(status, code3, code2, code1) compress advanced low;

Index created.

SQL> create index bowie_compress_best_order2_i on bowie_compress(code3, code2, code1) compress advanced low;

Index created.

SQL> select index_name, auto, constraint_index, visibility, compression, status, num_rows, leaf_blocks, clustering_factor
from user_indexes where table_name='BOWIE_COMPRESS';

INDEX_NAME                   AUT CON VISIBILIT COMPRESSION   STATUS     NUM_ROWS LEAF_BLOCKS CLUSTERING_FACTOR
---------------------------- --- --- --------- ------------- -------- ---------- ----------- -----------------
SYS_AI_bkdhrsd29vd4f         YES NO  VISIBLE   ADVANCED LOW  VALID      10000000       35451          10000000
BOWIE_COMPRESS_BEST_ORDER_I  NO  NO  VISIBLE   ADVANCED LOW  VALID      10000000       23509          10000000
BOWIE_COMPRESS_BEST_ORDER2_I NO  NO  VISIBLE   ADVANCED LOW  VALID      10000000       23462          10000000

 

We notice the resultant indexes are substantially smaller, at just 23509 and 23462 leaf blocks respectively.

It could well be that Index Compression is not yet documented because Oracle appreciates that more work yet needs to be done to create indexes optimally from a compression perspective…

Enable Index To Search For NULLs By Adding Constant to Index List. But Some Constants Better Than Others (Never Let Me Down) November 15, 2018

Posted by Richard Foote in Block Dumps, Index Internals, Indexing NULLs, Leaf Blocks, Oracle Indexes.
9 comments

Never Let Me Down

By default, Oracle doesn’t index an entry if all columns within the index are NULL. However, (as I’ve blogged previously), it’s possible to index all possible NULL values by simply adding a constant value to the index column list. Importantly, the CBO knows when a column has all it’s NULL values indexed and can potentially use the index accordingly.

However, the point of the article is to simple highlight that some constant values are better to use in this scenario than others…

A simple example to illustrate. First, create a table with both the CODE and GRADE columns nullable:

SQL> create table bowie (id number, code number, grade number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,100), mod(rownum,1000), 'DAVID
BOWIE' from dual connect by level 1000000;

999999 rows created.

SQL> insert into bowie values (1000000, null, null, 'ZIGGY STARDUST');

1 row created.

SQL> commit;

Commit complete.

If we create an index on both CODE and GRADE columns:

SQL> create index bowie_code_grade_i on bowie(code, grade);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BOWIE');

PL/SQL procedure successfully completed.

And then run a query looking for any CODE with a NULL value:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |       |    1 |    24 |    1115 (3) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE |    1 |    24 |    1115 (3) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE" IS NULL)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
4030 consistent gets
   0 physical reads
   0 redo size
 784 bytes sent via SQL*Net to client
 624 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

We notice the CBO performs a Full Table Scan even though the CBO knows there’s likely only one row that meets the criteria. Not matter what we do, hint the query, beg, whatever, it’s impossible for the CBO to use the index because the null row is simply not indexed.

Now we come to the rub of the post.

A common recommendation is to simply add a constant to the column list. A constant is always present and associated NULL values are indexed if another index column has a corresponding Non-NULL value. Adding a constant value to the index column list guarantees all NULL values for all index columns must always be present within the index. The CBO recognises this and can therefore potentially use the index to fetch the required NULL values.

However, a common recommendation is also to use a number as the constant. There was a recent tweet I saw a few days ago that had the following example of using the number 1 as the constant value:

SQL> create index bowie_code_grade_i_2 on bowie (code, grade, 1);

Index created.

When we run the query again:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 3086372235

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    24 |       4 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE                |    1 |    24 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_CODE_GRADE_I_2 |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE" IS NULL)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
784 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

The index is indeed used to return the row with the NULL value of interest and only 4 consistent gets are performed.

So what’s the problem?

Nothing, except that perhaps a better constant might have been used, such as say a single space:

SQL> create index bowie_code_grade_i_3 on bowie(code, grade, ' ');

Index created.

If we run the query yet again:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 3086372235

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    24 |       4 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE                |    1 |    24 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_CODE_GRADE_I_2 |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE" IS NULL)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
784 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

We get the exact same performance. So, what’s the point?

Well, if we look at the size of the corresponding indexes:

SQL> select index_name, leaf_blocks from user_indexes where table_name='BOWIE';

INDEX_NAME                LEAF_BLOCKS
------------------------- -----------
BOWIE_CODE_GRADE_I               2490
BOWIE_CODE_GRADE_I_2             2908
BOWIE_CODE_GRADE_I_3             2769

We notice the original index has the smallest size as expected, as it doesn’t have to index the constant value. But then we notice that the index with the constant value as the number is somewhat larger than the index with the constant value as a space.

Why?

An index block dump of both indexes will highlight why:

First a partial leaf block dump of index with the “1” as a constant:

Leaf block dump
===============
header address 925073508=0x37238064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 399
kdxcofbo 834=0x342
kdxcofeo 1652=0x674
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 29444101=0x1c14805
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8020] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 05 00 d7
row#1[8004] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 09 00 87
row#2[7988] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 0d 00 af

Next, a partial leaf block dump of index with the space ” ” as a constant:

Leaf block dump
===============
header address 925073508=0x37238064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 422
kdxcofbo 880=0x370
kdxcofeo 1706=0x6aa
kdxcoavs 826
kdxlespl 0
kdxlende 0
kdxlenxt 29447173=0x1c15405
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8021] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 05 00 d7
row#1[8006] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 09 00 87
row#2[7991] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 0d 00 af

We notice that the number requires 2 bytes, while the space only requires 1 byte.

So storing the constant as a single byte character, rather a 2 byte number is a free saving, which adds up with each and every index entry, by simply being a little more fastidious when selecting which constant value to use in this scenario.

“Hidden” Efficiencies of Non-Partitioned Indexes on Partitioned Tables Part II (Aladdin Sane) October 9, 2018

Posted by Richard Foote in Global Indexes, Index Internals, Local Indexes, Oracle Indexes, Partitioned Indexes, Partitioning, ROWID.
2 comments

aladdin sane

In Part I of this series, I highlighted how a Non-Partitioned Global Index on a Partitioned Table is able to effectively perform “Partition Pruning” by reading only the associated index entries to access just the table blocks of interest from relevant table partitions when the table partitioned keys are specified in an SQL Predicate.

Understanding how Oracle achieves this is key (pun fully intended) in understanding the associated advantages of Global Indexes.

Back in time before Oracle introduced Partitioning (pre-Oracle 8 days), the 6 byte ROWID was safely made up of the following components:

  • File Number
  • Block Number
  • Row Number

to uniquely determine the location of any given row.

If we look at a partial block dump of a leaf block from the index based on the Non-Partitioned table:

Leaf block dump
===============
header address 1881436260=0x70247064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0x426
kdxcofeo 1880=0x758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 29387269=0x1c06a05
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 c0 1d 68 00 18
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 c0 24 c8 00 c1
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 c0 3a 1c 00 96

We notice that the ROWID for each index entry is the standard 6 bytes in size.

With the introduction of Oracle 8 and the Partitioning Option, the File Number was no longer unique, with this number of files (approx. 1K) now possible not for the database at large, but for each Tablespace (thus making Oracle able to cater for very large databases with there now being the option for so many more data files in a database).

This means for a Partitioned Table in which each table partition (or sub-partition) could potentially reside in different tablespaces, the associated file number (RELATIVE_FNO) within the ROWID is no longer unique. Therefore, for Global Indexes in which index entries span across all table partitions, the ROWID is extended to include the 4 byte Data Object Id. A specific object can only live in one tablespace and if Oracle knows the tablespace, Oracle can determine which specific file number the ROWID is referencing. So an extended ROWID is consists of:

  • Data Object Id
  • File Number
  • Block Number
  • Row Number

If we look at a partial block dump of a leaf block from the index based on the Partitioned table:

Leaf block dump
===============
header address 1881436260=0x70247064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 399
kdxcofbo 834=0x342
kdxcofeo 1652=0x674
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 29385221=0x1c06205
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8020] flag: ——-, lock: 0, len=16
col 0; len 2; (2): c1 02
col 1; len 10; (10): 00 01 57 4a 01 c0 5e cf 00 cc
row#1[8004] flag: ——-, lock: 0, len=16
col 0; len 2; (2): c1 02
col 1; len 10; (10): 00 01 57 4a 01 c0 5f 74 00 e7
row#2[7988] flag: ——-, lock: 0, len=16
col 0; len 2; (2): c1 02
col 1; len 10; (10): 00 01 57 4b 01 c0 5c 32 00 c9

We notice that the ROWID for each index entry is now the extended 10 bytes in size as it includes the Data Object Id.

Storing the Data Object Id as part of the ROWID has various advantages, such as being able to asynchronously maintain index entries following table partition operations such as dropping a table partition (as discussed previously here).

However the key advantage of storing the Data Object Id as part of the ROWID is that this enables Oracle when using Global Indexes to automatically perform “Partition Pruning” (the ability to access only those partitions that can possibly contain data of interest), when the table partition key is specified in an SQL predicate.

When the table partition key is specified in an SQL predicate, Oracle can determine which table partitions can only contain such data and then only access the table blocks via the index ROWIDs that have corresponding Data Object Ids of interest. This is how in the example in Part I Oracle was able to only access just the table block that belongs in the table partition of interest, effectively performing predicate filtering at the index level, without unnecessarily having to access the table blocks at all from partitions that are not of interest.

This enables Global Indexes to have almost Local Index like performance in scenarios where the table partition key is specified in SQL predicates. Local Indexes do have the advantage of potentially having a reduced BLEVEL in that if you have say 100 table partitions, each Local Index would only have to be approx. 1/100 the size of the single, Non-Partitioned Index (although Global Indexes can in turn be partitioned if individual index size were problematic, even if the table were not partitioned). Additionally, Local Indexes don’t have to concern themselves with having to read through unnecessary index entries if index entries associated with a specific subset of table partitions were only of interest.

However, Global Indexes have a key performance advantage over Local Indexes which I’ll discussed in Part III.

 

Index Column Order – Impact On Index Branch Blocks Part I (Day-In Day-Out) June 4, 2018

Posted by Richard Foote in Block Dumps, Branch Blocks, Index Branches, Index Column Order, Index Compression, Index Internals, Oracle Indexes.
15 comments

day in day out bowie

I recently replied on Twitter to some comments regarding an excellent blog post by Franck Pachot – Covering indexes in Oracle, and branch size, where I disagreed somewhat with one of the conclusions stated in the post:

ensure that selective columns appear as early as possible (without compromising the index access efficiency of course) in order to lower the bytes required to address branches and leaves“.

Based on the Twitter discussion, the post was updated on 14 April 2018 with an additional clarification that putting the most selective indexed column first is a “common misconception“.

I’ve written a number of times about index column order, including this post that’s now some 10 years old – “It’s Less Efficient To Have Low Cardinality Leading Columns In An Index (Right) ?“. The summary being that it generally makes no appreciable difference to the performance of an index in which order you position the columns in an index, if all index columns are referenced equality type SQL predicates. I thought it might be worth revisiting this topic, with a new example that discusses why I specifically disagree with the notion of putting the most selective columns first, despite the possible impact on Index Branches.

I’ll begin with a simple table that has 2 columns of interest, the ID which is effectively unique and the CODE column which is “relatively” large in size but only has 5 distinct values:

SQL> CREATE TABLE ziggy AS
SELECT rownum id, 'SOME LARGE OFTEN REPEATED VALUE ' || mod(rownum,5) code, 'ZIGGY' name
FROM dual CONNECT BY LEVEL <= 2000000;

Table created.

I'll next create a concatenated index based on both the ID and CODE columns, with the highly selective ID column leading:

SQL> create index ziggy_id_code_i ON ziggy(id, code);

Index created.

SQL> analyze index ziggy_id_code_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14135         23      176612   113264736  101146313

So we notice the index has a Height of 3, with a total of 23 Index Branch blocks. There are a total of 14,135 leaf blocks.

If we look at a partial block dump of a Branch block:

Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 21
kdxcofbo 70=0x46
kdxcofeo 7840=0x1ea0
kdxcoavs 7770
kdxbrlmc 29440826=0x1c13b3a
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8050] dba: 29441507=0x1c13de3
col 0; len 4; (4): c3 0a 45 4e
col 1; TERM
row#1[8040] dba: 29442190=0x1c1408e
col 0; len 4; (4): c3 14 1b 58
col 1; TERM
row#2[8030] dba: 29442871=0x1c14337
col 0; len 4; (4): c3 1d 55 62
col 1; TERM

We can see that each entry in the Index Branch only contains the leading ID column. That’s because the column is so selective that it provides all the necessary data to determine the exact Leaf Block location of any given indexed value. The following columns (CODE and ROWID) do not provide any additional useful information and would be redundant if stored. Therefore each Index Branch entry is shown with a TERM value, meaning that subsequent indexed values are not stored within the Index Branch.

SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
-----------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |        4(0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |        4(0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN            | ZIGGY_ID_CODE_I |    1 |       |        3(0) |  00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=4242 AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2')

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
713 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
------------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time       |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |       9 (0) |   00:00:01 |
|  1 | INLIST ITERATOR             |                 |      |       |             |            |
|  2 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |       9 (0) |   00:00:01 |
|* 3 | INDEX RANGE SCAN            | ZIGGY_ID_CODE_I |    1 |       |       8 (0) |   00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access(("ID"=4 OR "ID"=42 OR "ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242)
AND "CODE"='SOME LARGE OFTEN REPEATED VALUE 2')

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
 19 consistent gets
  0 physical reads
  0 redo size
861 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  3 rows processed

We note for now the number of consistent gets (5 and 19) for each of these queries.

If we now create another index, but this time with the columns the other way around and so with the very unselective CODE column leading:

SQL> create index ziggy_code_id_i on ziggy(code,id);

Index created.

SQL> analyze index ziggy_code_id_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14125         83      656341   113666656  101626042

So the number of Index Branch blocks has increased from 23 to 83 compared to the other index (although the number of Leaf Blocks are almost the same). Note that at 83, the percentage of branch blocks to leaf blocks is still tiny, just 0.06%.

The reason for the greater number of Index Branches can be seen with a partial index block dump of an Index Branch:

Branch block dump
=================
header address 508428364=0x1e4e004c
kdxcolev 2
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 3
kdxcosdc 0
kdxconro 81
kdxcofbo 190=0xbe
kdxcofeo 4458=0x116a
kdxcoavs 4268
kdxbrlmc 29440318=0x1c1393e
kdxbrsno 0
kdxbrbksz 8060
kdxbr2urrc 0
row#0[8016] dba: 29440496=0x1c139f0
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 0d 3d 38
col 2; TERM
row#1[7972] dba: 29440676=0x1c13aa4
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 1a 0c 51
col 2; TERM
row#2[7928] dba: 29440854=0x1c13b56
col 0; len 33; (33):
53 4f 4d 45 20 4c 41 52 47 45 20 4f 46 54 45 4e 20 52 45 50 45 41 54 45 44
20 56 41 4c 55 45 20 30
col 1; len 4; (4): c3 26 40 06
col 2; TERM

With the larger CODE column now leading, the column must therefore be stored within the Branch Block. However, as this column is so unselective with just 5 distinct values (notice how the same col 0 CODE value is repeated for each of the displayed branch entries), it’s not sufficient on its own to ensure the navigation down to the first leaf block containing the required index entry. Therefore, the next column (the highly selective col 1 ID column) is also necessary as part of each branch entry.

The branch entry with both the CODE and ID columns has ranges sufficiently selective enough to ensure any indexed value can be found within leaf blocks. Therefore the third column (the Rowid) is not required and is marked with the TERM value in the block dump.

So on the surface, it looks as if this index is not as efficient as there are indeed more Index Branches within the index. However, during a typical index range scan, only one branch block is accessed for each level index branches exist. Unless we can reduce the number of branch blocks required at a specific level to just one branch block thereby reducing the height/blevel of an index (an extremely rare edge case), having more branches as in this example makes no appreciable difference to the efficiency of the index.

If we run the same queries as we did when using the previous index:

SQL> SELECT * FROM ziggy WHERE id = 4242 and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
-----------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |        4(0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |        4(0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN            | ZIGGY_CODE_ID_I |    1 |       |        3(0) |  00:00:01 |
-----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND "ID"=4242)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
713 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

SQL> SELECT * FROM ziggy WHERE id in (4, 42, 424, 4242, 42424, 424242) and code = 'SOME LARGE OFTEN REPEATED VALUE 2';

Execution Plan
------------------------------------------------------------------------------------------------
| Id | Operation                   | Name            | Rows | Bytes | Cost (%CPU) | Time       |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                 |    1 |    45 |       9 (0) |   00:00:01 |
|  1 | INLIST ITERATOR             |                 |      |       |             |            |
|  2 | TABLE ACCESS BY INDEX ROWID | ZIGGY           |    1 |    45 |       9 (0) |   00:00:01 |
|* 3 | INDEX RANGE SCAN            | ZIGGY_CODE_ID_I |    1 |       |       8 (0) |   00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"='SOME LARGE OFTEN REPEATED VALUE 2' AND ("ID"=4 OR "ID"=42 OR
"ID"=424 OR "ID"=4242 OR "ID"=42424 OR "ID"=424242))

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
 19 consistent gets
  0 physical reads
  0 redo size
861 bytes sent via SQL*Net to client
608 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  3 rows processed

We notice the number of consistent gets remains exactly the same, with the additional branch blocks making no appreciable difference to the performance of the index.

So the column order, providing all index columns are referenced with equality type SQL predicates, makes no real difference to the performance of the index. In both cases, there are enough columns referenced in the branch blocks to always point down to the first index leaf block that contains the first index entry of interest.

In Part II, we’ll see how having the unselective column as the leading column of the index can actually make an appreciable positive difference to the index.

Indexing Seminar: Why Small Classes? (A Small Plot Of Land) October 5, 2017

Posted by Richard Foote in Index Internals, Index Internals Seminar, Oracle Indexes, Richard Foote Consulting, Richard's Musings.
4 comments

outside2

Firstly, a huge thank-you to everyone who has enroled in one of my “Indexing Internals and Best Practices” Seminars running soon is Australia. The feedback and overall reaction has been extremely positive and I can’t wait to present all the material I’ve been putting together these past few weeks.

There are currently limited places still available at each of the 6 venues, but places are selling out fast. All enrolment details can be found here: https://www.eventbrite.com.au/o/richard-foote-consulting-pty-ltd-15076769762

A question I’ve been asked a couple of times is why the insistence on only running small classes? I’ve previously run similar events overseas to large audiences, I’ve had classes of 40+ a number of times in Europe and I’ve run a one day version of this at Hotsos last year to more people than I could count (I’m guessing 80+). And while they were all highly successful, I will be restricting classes to much smaller class sizes in this Australia tour and into the future.  There are a number of reasons for this, with the key points being:

  • The seminar has been fully revised and updated and is packed with heaps of new content. I want to be able to cover as much material as is practical and this is only possible with small classes. The more I cover, the more attendees have the opportunity to learn.
  • I really want these seminars to be interactive events, with attendees having plenty of opportunity to ask questions. This makes it a more valuable experience as any ambiguities, doubts, environment specific questions, important bits that people don’t get first time, etc. etc. can clarified and addressed.
  • The seminar goes for 2 days, but if I had 50 people and each person asked 15 mins of questions that needed answering, that’s almost the entire time gone. So larger class sizes makes it impractical to have any meaningful Q&A
  • Attendees bring vast amounts of experiences and solutions to Oracle Indexing related scenarios as all Oracle databases out there utilise indexes. Even on Exadata. Even in future autonomous databases. Small classes gives me the opportunity to tap into this.
  • I want to learn just as much as everybody else and if I do all the talking, I’ll learn nothing new. Small classes gives me the opportunity to learn, based on questions asked and experiences shared.
  • All attendees have their own requirements and expectations from attending the seminar. If someone has specific issues at their environments, what a wonderful opportunity to learn and perhaps get these issues addressed by attending the seminar. Again only practical in small class environments.
  • I will be contacting all attendees BEFORE the seminars to determine what their individual expectations are from the seminars. I will then ensure that any appropriate material is fully covered to meet those expectations. This again is only practical with small class sizes.
  • Finally, you’re spending 2 days with the one and only Richard Foote 🙂 As well as learning more about indexes than you thought possible, as well as learning how an appropriate indexing strategy can dramatically improve database/application performance, as well as discovering an appreciations for the works of David Bowie, it’s also an opportunity to have a sense of 1 on 1 with myself  and that I’m there to directly assist “you” with truly understanding how Oracle indexes should be properly utilised. There is little sense of 1 on 1 with large class sizes.

 

So these seminars are designed very intentionally to be “premium” events, offering a training experience that really is second to none. These seminars have always been very well received so I’m truly excited that I’ll be running these events with the best content I’ve ever compiled. Small class settings gives all attendees the very best opportunity to absorb it all in and apply directly what they’re learnt to improving the performance of their database/application environments..

Like I said, there are only limited places now remaining for the Australian Tour so get in early or else risk missing out.

Events in other parts of our beautiful planet are currently being planned and will be announced in the near future. If you have any questions or requests on where to run these seminars, just ask: richard@richardfooteconsulting.com

Why A Brand New Index Might Benefit From An Immediate Coalesce (One Slip) July 6, 2015

Posted by Richard Foote in ASSM, Index Block Splits, Index Internals, Insert Append, Oracle Indexes, Tree Dumps, Truncate Indexes.
2 comments

A recent question on the OTN Forums Reg: Index – Gathering Statistics vs. Rebuild got me thinking on a scenario not unlike the one raised in the question where a newly populated index might immediately benefit from a coalesce.

I’ve previously discussed some of the pertinent concepts such as how index rebuilds can make indexes bigger, not smaller and some of the issues around 90-10 block splits not occurring.

Let me show you a scenario where a newly populated index might benefit from an immediate coalesce.

As usual, I start with my little Bowie table and index on the ID column:

SQL> create table bowie (id number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_id_i on bowie(id);

Index created.

If we look at the current statistics on table and index:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> select num_rows, blocks, empty_blocks from dba_tables where table_name='BOWIE';

NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ------------
1000000       3142            0

SQL> select blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_ID_I';

BLEVEL LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
2        2226    1000000

We note the index currently has 2226 leaf blocks. A look at a partial tree dump of the index:

—– begin tree dump
branch: 0x1834723 25380643 (0: nrow: 4, level: 2)
branch: 0x18340b6 25378998 (-1: nrow: 672, level: 1)
leaf: 0x1834724 25380644 (-1: row:485.485 avs:828)
leaf: 0x1834725 25380645 (0: row:479.479 avs:820)
leaf: 0x1834726 25380646 (1: row:479.479 avs:820)
leaf: 0x1834727 25380647 (2: row:479.479 avs:820)
leaf: 0x1834728 25380648 (3: row:479.479 avs:820)
leaf: 0x1834729 25380649 (4: row:479.479 avs:819)
leaf: 0x183472a 25380650 (5: row:479.479 avs:820)
leaf: 0x183472b 25380651 (6: row:479.479 avs:820)
leaf: 0x183472c 25380652 (7: row:479.479 avs:820)
leaf: 0x183472d 25380653 (8: row:479.479 avs:819)
leaf: 0x183472e 25380654 (9: row:479.479 avs:820)
leaf: 0x183472f 25380655 (10: row:479.479 avs:820)

Shows us that the index is fully populated with just the default 10% pctfree of free space (most leaf blocks have an avs of 819/820 free bytes).

The forum question mentions a scenario where 70% of the table is archived away. This is done by storing in a temporary table the 30% of required data, followed by a truncate of the original table and the 30% of data being re-inserted. Something like the following:

SQL> create table bowie_temp as select * from bowie where id > 700000;

Table created.

SQL> truncate table bowie;

Table truncated.

So we store in the temp table all values with an ID > 700000. When a table is truncated, so are all the associated indexes. If we performed a tree dump of the index straight after the truncate:

—– begin tree dump
leaf: 0x1834723 25380643 (0: row:0.0 avs:8000)
—– end tree dump

We can see the index consists now of nothing but an empty leaf block.

If we now re-insert the 30% of data of interest and collect fresh statistics:

SQL> insert into bowie select * from bowie_temp;

300000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> select num_rows, blocks, empty_blocks from dba_tables where table_name='BOWIE';

NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ------------
300000       1000            0

SQL> select blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_ID_I';

BLEVEL LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
2        1112     300000

We note the number of leaf blocks has effectively halved to just 1112 leaf blocks down from 2226 leaf blocks.

But isn’t this index somewhat larger than it should be ? If we removed 70% of data, why has the index just reduced by 50% ?

If we look at a partial tree dump of the index now:

branch: 0x1834723 25380643 (0: nrow: 3, level: 2)
branch: 0x18335de 25376222 (-1: nrow: 332, level: 1)
leaf: 0x1834727 25380647 (-1: row:255.255 avs:3922)
leaf: 0x18004f6 25167094 (0: row:255.255 avs:3923)
leaf: 0x18004f2 25167090 (1: row:255.255 avs:3922)
leaf: 0x18004f3 25167091 (2: row:255.255 avs:3923)
leaf: 0x18004f4 25167092 (3: row:255.255 avs:3922)
leaf: 0x1800505 25167109 (4: row:449.449 avs:821)
leaf: 0x18004f5 25167093 (5: row:246.246 avs:4066)
leaf: 0x18004f1 25167089 (6: row:246.246 avs:4067)
leaf: 0x1834724 25380644 (7: row:500.500 avs:5)
leaf: 0x1834725 25380645 (8: row:500.500 avs:5)
leaf: 0x1834726 25380646 (9: row:500.500 avs:5)
leaf: 0x18004f7 25167095 (10: row:500.500 avs:5)
leaf: 0x18004f0 25167088 (11: row:255.255 avs:3922)
leaf: 0x1833d8c 25378188 (12: row:255.255 avs:3923)
leaf: 0x18331ed 25375213 (13: row:255.255 avs:3922)
leaf: 0x18331fd 25375229 (14: row:255.255 avs:3923)
leaf: 0x18331fa 25375226 (15: row:255.255 avs:3922)
leaf: 0x18331c8 25375176 (16: row:255.255 avs:3923)
leaf: 0x18331c9 25375177 (17: row:253.253 avs:3954)
leaf: 0x18331cd 25375181 (18: row:255.255 avs:3923)
leaf: 0x18331d4 25375188 (19: row:255.255 avs:3923)
leaf: 0x18331e0 25375200 (20: row:255.255 avs:3922)

We notice that vast areas of the index now has 50% of free space, not the default 10% it had previously.

The “problem” here is that when data was stored in the temp table, there was nothing to guarantee that the data be physically stored in the same order as the ID column. ASSM tablespaces will effectively pick random free blocks below the high water mark of the table so that although data might well be ordered within a block, the blocks within the table might not be logically ordered with respect to the data being inserted.

A simple select from the temp table displaying the “first” 20 rows of the table will illustrate this:

SQL> select * from bowie_temp where rownum <=20;

ID NAME
---------- ------------------------------------------
701717 DAVID BOWIE
701718 DAVID BOWIE
701719 DAVID BOWIE
701720 DAVID BOWIE
701721 DAVID BOWIE
701722 DAVID BOWIE
701723 DAVID BOWIE
701724 DAVID BOWIE
701725 DAVID BOWIE
701726 DAVID BOWIE
701727 DAVID BOWIE
701728 DAVID BOWIE
701729 DAVID BOWIE
701730 DAVID BOWIE
701731 DAVID BOWIE
701732 DAVID BOWIE
701733 DAVID BOWIE
701734 DAVID BOWIE
701735 DAVID BOWIE
701736 DAVID BOWIE

Note that the first selected range of values starts with 701717 and not 700001.

Therefore, when the data is loaded back within the table, the data is not necessarily ordered as it was previously and an insert into a full leaf block might not necessarily have the largest ID column currently within the table/index. That being the case, a 50-50 block split is performed rather than the 90-10 block splits that only occur when it’s the largest indexed value being inserted into a full leaf block. 90-10 block splits leaves behind nice full leaf blocks, but 50-50 block splits leaves behind 1/2 emptied blocks that don’t get filled if they don’t house subsequent inserts.

The 50-50 leaf block split is resulting in a larger index but most importantly, these areas of free space are not going to be used by subsequent monotonically increasing ID index values.

To reclaim this effectively unusable index storage, the index would benefit from an immediate coalesce.

Of course, the best cure would be to prevent this scenario from occurring in the first place.

One option would be to insert the data in a manner that ensures the effective ordering of the indexed data:

SQL> truncate table bowie;

Table truncated.

SQL> insert into bowie select * from bowie_temp order by id;

300000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> select num_rows, blocks, empty_blocks from dba_tables where table_name='BOWIE';

NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ------------
300000       1000            0

SQL> select blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_ID_I';

BLEVEL LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
1         600     300000

We notice by inserting the data in ID order within the table (via the “order by” clause), we guarantee that an insert into a full leaf block will indeed be as a result of the highest current index entry and that 90-10 block splits are performed. This leaves behind perfectly full leaf blocks and a nice, compact index structure. The index now only has 600 leaf blocks, significantly less than before with no “wasted” storage.

A partial tree dump of the index highlights this:

—– begin tree dump
branch: 0x1834723 25380643 (0: nrow: 600, level: 1)
leaf: 0x1834727 25380647 (-1: row:500.500 avs:5)
leaf: 0x1834724 25380644 (0: row:500.500 avs:5)
leaf: 0x1834725 25380645 (1: row:500.500 avs:5)
leaf: 0x1834726 25380646 (2: row:500.500 avs:5)
leaf: 0x18004f7 25167095 (3: row:500.500 avs:5)
leaf: 0x18004f0 25167088 (4: row:500.500 avs:5)
leaf: 0x18004f1 25167089 (5: row:500.500 avs:5)
leaf: 0x18004f5 25167093 (6: row:500.500 avs:5)
leaf: 0x18004f6 25167094 (7: row:500.500 avs:5)
leaf: 0x18004f2 25167090 (8: row:500.500 avs:5)
leaf: 0x18004f3 25167091 (9: row:500.500 avs:5)
leaf: 0x18004f4 25167092 (10: row:500.500 avs:5)

The index leaf blocks are chock-a-block full with just 5 bytes free, not enough space for another index entry.

Another alterative would of course be to make the indexes unusable before re-inserting data into the data and rebuild the index later with a pctfree of 0, a safe and appropriate value for monotonically increasing values. This has the added advantage of significantly improving the performance of the bulk insert operation.

If the data isn’t monotonically increasing, then 50-50 block splits are fine as the resultant free space would indeed be effectively used.

Update: 7 July 2015

And as Jonathan Lewis kindly reminded me, another method to avoid this issue is to simply use an insert Append. This will record all the key entries as it goes, sort them and populate the index much more efficiently in one step:

SQL> truncate table bowie;

Table truncated.

SQL> insert /*+ append */ into bowie select * from bowie_temp;

300000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> select num_rows, blocks, empty_blocks from dba_tables where table_name='BOWIE';

NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ------------
300000        936            0

SQL> select blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_ID_I';

BLEVEL LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
1         600     300000

So at 600 leaf blocks, the index is again populated within a fully compacted index structure.

Empty Leaf Blocks After Rollback Part II (Editions of You) June 24, 2015

Posted by Richard Foote in Block Dumps, Index Internals, Oracle Indexes, Tree Dumps, Unique Indexes, Update Indexes.
5 comments

In my last post, I discussed how both 1/2 empty and totally empty leaf blocks can be generated by rolling back a bulk update operation.

An important point I made within the comments of the previous post is that almost the exact scenario would have taken place had the transaction committed rather than rolled back. A commit would also have resulted with the leaf blocks being 1/2 empty in the first example (with the previous index entries now all marked as deleted) and with effectively empty leaf blocks in the second example (with the previous leaf blocks all now containing index entries marked as deleted). The important aspect here is not the rollback but the fact that update statements result in the deletion of the previous indexed value and the re-insertion of the new value. (BTW, it’s always a useful exercise to read through the comments on this blog as this is often where some of the best learning takes place due to some of the really nice discussions) 🙂

That said, the previous post used a Non-Unique index. Let’s now repeat the same scenario but this time use a Unique Index instead.

So let’s start with another table with the same data but this time with a unique index on the ID column:

SQL> create table ziggy (id number, name varchar2(42));

Table created.

SQL> insert into ziggy select rownum, 'DAVID BOWIE' from dual connect by level <=10000;

10000 rows created.

SQL> commit;

Commit complete.

SQL> create unique index ziggy_id_i on ziggy(id);

Index created.

 

OK, let’s have a look at a tree dump of this index:

—– begin tree dump
branch: 0x180017b 25166203 (0: nrow: 20, level: 1)
leaf: 0x180017c 25166204 (-1: row:520.520 avs:824)
leaf: 0x180017d 25166205 (0: row:513.513 avs:823)
leaf: 0x180017e 25166206 (1: row:513.513 avs:823)
leaf: 0x180017f 25166207 (2: row:513.513 avs:823)
leaf: 0x18004c8 25167048 (3: row:513.513 avs:823)
leaf: 0x18004c9 25167049 (4: row:513.513 avs:823)
leaf: 0x18004ca 25167050 (5: row:513.513 avs:823)
leaf: 0x18004cb 25167051 (6: row:513.513 avs:824)
leaf: 0x18004cc 25167052 (7: row:513.513 avs:823)
leaf: 0x18004cd 25167053 (8: row:513.513 avs:823)
leaf: 0x18004ce 25167054 (9: row:513.513 avs:823)
leaf: 0x18004cf 25167055 (10: row:513.513 avs:823)
leaf: 0x18004d1 25167057 (11: row:513.513 avs:823)
leaf: 0x18004d2 25167058 (12: row:513.513 avs:823)
leaf: 0x18004d3 25167059 (13: row:513.513 avs:824)
leaf: 0x18004d4 25167060 (14: row:513.513 avs:823)
leaf: 0x18004d5 25167061 (15: row:513.513 avs:823)
leaf: 0x18004d6 25167062 (16: row:513.513 avs:823)
leaf: 0x18004d7 25167063 (17: row:513.513 avs:823)
leaf: 0x18004d8 25167064 (18: row:246.246 avs:4559)
—– end tree dump

Now we notice a bit of a difference already. Here, the index consists of 20 leaf blocks with 513 index entries in most leaf blocks whereas the non-unique index had 21 leaf blocks and just 479 index entries per leaf block. One of the advantages of unique indexes over non-unique as I’ve discussed previously.

Let’s now perform our first bulk update where I increment the ID of each value by 1:

SQL> update ziggy set id=id+1;

10000 rows updated.

SQL> rollback;

Rollback complete.

Now with the non-unique index, this resulted in the index doubling in size as we created an additional index entry for each and every row. After the rollback, we were effectively left with an index that not only was twice the size but had only 1/2 empty leaf blocks.

With a unique index though, things differ. The most important characteristic of a unique index of course is that each index value can only ever exist once, each index entry must be unique. So for a unique index, the rowid is not actually part of the indexed column list, but treated as additional “overhead” or metadata associated with the index entry.

When we perform our update here, we’re effectively replicating each value, except for the very last ID value where 10001 doesn’t exist. But with the first row, when the ID=1 becomes 2 after the update, we already have an index entry with an ID value of 2 (the second row). So Oracle can mark the first index entry as deleted (as ID=1 no longer exists) but rather than insert a new index entry simply update the rowid associated with the unique index entry with the ID of 2. Oracle then updates the rowid of the index entry with a value of 3 with the rowid of that previously referenced ID=2 . And so on and so on for all the other index entries except for index value 100001 which has to be inserted as it didn’t previously exist. So Oracle nicely maintains the consistency of the index during the single update operation by effectively recycling the existing index entries.

The net result is that the index remains the same size as the index entries are not reinserted as they are for a non-unique index. The effective change that occurs during this update is that the first index entry is marked as deleted and one new index entry is added at the very end.

If we look at a partial block dump of the first leaf block before the rollback operation:

Leaf block dump
===============
header address 375991908=0x16692e64
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 1
kdxcosdc 0
kdxconro 520
kdxcofbo 1076=0x434
kdxcofeo 1900=0x76c
kdxcoavs 824
kdxlespl 0
kdxlende 1
kdxlenxt 25166205=0x180017d
kdxleprv 0=0x0
kdxledsz 6
kdxlebksz 8036
row#0[8025] flag: —D—, lock: 2, len=11, data:(6):  01 80 01 57 00 00
col 0; len 2; (2):  c1 02
row#1[8014] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 00
col 0; len 2; (2):  c1 03
row#2[8003] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 01
col 0; len 2; (2):  c1 04
row#3[7992] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 02
col 0; len 2; (2):  c1 05
row#4[7981] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 03
col 0; len 2; (2):  c1 06
row#5[7970] flag: ——-, lock: 2, len=11, data:(6):  01 80 01 57 00 04
col 0; len 2; (2):  c1 07

We notice that the first index entry is marked as deleted (as we now no longer have an ID=1) but all the other index entries have been “recycled” with their updated rowids. Note how the rowid of the deleted index entry (01 80 01 57 00 00) is now associated with the second index entry (which is effectively now the first index entry now).

If we look at a tree dump after the rollback was performed:

—– begin tree dump
branch: 0x180017b 25166203 (0: nrow: 20, level: 1)
leaf: 0x180017c 25166204 (-1: row:520.520 avs:824)
leaf: 0x180017d 25166205 (0: row:513.513 avs:823)
leaf: 0x180017e 25166206 (1: row:513.513 avs:823)
leaf: 0x180017f 25166207 (2: row:513.513 avs:823)
leaf: 0x18004c8 25167048 (3: row:513.513 avs:823)
leaf: 0x18004c9 25167049 (4: row:513.513 avs:823)
leaf: 0x18004ca 25167050 (5: row:513.513 avs:823)
leaf: 0x18004cb 25167051 (6: row:513.513 avs:824)
leaf: 0x18004cc 25167052 (7: row:513.513 avs:823)
leaf: 0x18004cd 25167053 (8: row:513.513 avs:823)
leaf: 0x18004ce 25167054 (9: row:513.513 avs:823)
leaf: 0x18004cf 25167055 (10: row:513.513 avs:823)
leaf: 0x18004d1 25167057 (11: row:513.513 avs:823)
leaf: 0x18004d2 25167058 (12: row:513.513 avs:823)
leaf: 0x18004d3 25167059 (13: row:513.513 avs:824)
leaf: 0x18004d4 25167060 (14: row:513.513 avs:823)
leaf: 0x18004d5 25167061 (15: row:513.513 avs:823)
leaf: 0x18004d6 25167062 (16: row:513.513 avs:823)
leaf: 0x18004d7 25167063 (17: row:513.513 avs:823)
leaf: 0x18004d8 25167064 (18: row:246.246 avs:4559)
—– end tree dump

We notice that it’s exactly the same size as before and we don’t have the same issues with a bloated index as we did in the previous non-unique index example.

However, if we perform the second update which effectively changes all the ID values to those which don’t currently exist within the table:

SQL> update ziggy set id=id+10000;

10000 rows updated.

SQL> rollback;

Rollback complete.

Now Oracle can’t recycle the existing index entries as the new values don’t currently exist within the index. So Oracle is indeed forced to mark all the existing index entries as deleted and insert new index entries into the index. These new index entries all exist in the right hand most side of the index, resulting in 90-10 block splits with additional index leaf blocks being added to the index. If we rollback this transaction, it will result in all the new index entries being removed, leaving behind these new empty leaf blocks just as with the non-unique index example.

A new tree dump will confirm this:

—– begin tree dump
branch: 0x180017b 25166203 (0: nrow: 47, level: 1)
leaf: 0x180017c 25166204 (-1: row:520.520 avs:824)
leaf: 0x180017d 25166205 (0: row:513.513 avs:823)
leaf: 0x180017e 25166206 (1: row:513.513 avs:823)
leaf: 0x180017f 25166207 (2: row:513.513 avs:823)
leaf: 0x18004c8 25167048 (3: row:513.513 avs:823)
leaf: 0x18004c9 25167049 (4: row:513.513 avs:823)
leaf: 0x18004ca 25167050 (5: row:513.513 avs:823)
leaf: 0x18004cb 25167051 (6: row:513.513 avs:824)
leaf: 0x18004cc 25167052 (7: row:513.513 avs:823)
leaf: 0x18004cd 25167053 (8: row:513.513 avs:823)
leaf: 0x18004ce 25167054 (9: row:513.513 avs:823)
leaf: 0x18004cf 25167055 (10: row:513.513 avs:823)
leaf: 0x18004d1 25167057 (11: row:513.513 avs:823)
leaf: 0x18004d2 25167058 (12: row:513.513 avs:823)
leaf: 0x18004d3 25167059 (13: row:513.513 avs:824)
leaf: 0x18004d4 25167060 (14: row:513.513 avs:823)
leaf: 0x18004d5 25167061 (15: row:513.513 avs:823)
leaf: 0x18004d6 25167062 (16: row:513.513 avs:823)
leaf: 0x18004d7 25167063 (17: row:513.513 avs:823)
leaf: 0x18004d8 25167064 (18: row:246.246 avs:4559)
leaf: 0x18004d9 25167065 (19: row:0.0 avs:8000)
leaf: 0x18004da 25167066 (20: row:0.0 avs:8000)
leaf: 0x18004df 25167071 (21: row:0.0 avs:8000)
leaf: 0x18004dd 25167069 (22: row:0.0 avs:8000)
leaf: 0x18004de 25167070 (23: row:0.0 avs:8000)
leaf: 0x18004db 25167067 (24: row:0.0 avs:8000)
leaf: 0x18004dc 25167068 (25: row:0.0 avs:8000)
leaf: 0x18004e5 25167077 (26: row:0.0 avs:8000)
leaf: 0x18004e6 25167078 (27: row:0.0 avs:8000)
leaf: 0x18004e7 25167079 (28: row:0.0 avs:8000)
leaf: 0x18004e4 25167076 (29: row:0.0 avs:8000)
leaf: 0x18004ed 25167085 (30: row:0.0 avs:8000)
leaf: 0x18004ee 25167086 (31: row:0.0 avs:8000)
leaf: 0x18004ef 25167087 (32: row:0.0 avs:8000)
leaf: 0x18004e1 25167073 (33: row:0.0 avs:8000)
leaf: 0x18004e2 25167074 (34: row:0.0 avs:8000)
leaf: 0x18004e3 25167075 (35: row:0.0 avs:8000)
leaf: 0x18004e9 25167081 (36: row:0.0 avs:8000)
leaf: 0x18004ea 25167082 (37: row:0.0 avs:8000)
leaf: 0x18004eb 25167083 (38: row:0.0 avs:8000)
leaf: 0x18004ec 25167084 (39: row:0.0 avs:8000)
leaf: 0x18004f5 25167093 (40: row:0.0 avs:8000)
leaf: 0x18004f6 25167094 (41: row:0.0 avs:8000)
leaf: 0x18004f7 25167095 (42: row:0.0 avs:8000)
leaf: 0x18004f1 25167089 (43: row:0.0 avs:8000)
leaf: 0x18004e8 25167080 (44: row:0.0 avs:8000)
leaf: 0x18004f2 25167090 (45: row:0.0 avs:8000)
—– end tree dump

 

The index has indeed bloated in size as a result of the update. Note that the index would be the same size had the transaction committed, except that the leaf blocks that currently contain data would effectively be empty and contain nothing but deleted index entries while the empty leaf blocks would all contain the new indexed values.

So depending on the update operation, a unique index can potentially reuse existing index entries if the new column values existed previously in other rows. If not, then the usual delete/insert mechanism applies.