jump to navigation

Presenting at AUSOUG Connect 2018 Conference in Melbourne, 21 November 2018. November 18, 2018

Posted by Richard Foote in Connect 2018, Oracle Indexes.
add a comment

ausoug

 

After initially not being in a position to make it this year, I will now be presenting at the AUSOUG Connect 2018 Conference in Melbourne this coming Wednesday, 21 November 2018.

My presentation will be:

12c Release 2 and 18c – New Indexing Related Features

Oracle Database 12.2 and 18.3 releases have introduced a number of extremely useful new indexing features and enhancements. These include cool capabilities such as automatically tracking index usage, advanced index compression enhancements, deferred invalidation of cursors during index creation/rebuild, automatic index maintenance during new online operations (such as online table moves and conversion to partitioned objects), JSON indexing enhancements, Memoptimized Rowstore and Scalable Sequences. There might even be an Oracle Database 19c surprise thrown in.

These will all be discussed in detail with practical examples on how they can be usefully deployed to improve overall database performance.

 

For all the conference details, see: https://www.ausoug.org.au/whats-on/connect-2018/melbourne/

This conference has a fabulous lineup and I’m really excited at being able to now take part in it all. A special thank you to the AUSOUG team for catering for me at such late notice.

Looking forward to catching up with many of my Aussie Oracle mates there 🙂

 

ausoug

Advertisements

Enable Index To Search For NULLs By Adding Constant to Index List. But Some Constants Better Than Others (Never Let Me Down) November 15, 2018

Posted by Richard Foote in Block Dumps, Index Internals, Indexing NULLs, Leaf Blocks, Oracle Indexes.
6 comments

Never Let Me Down

By default, Oracle doesn’t index an entry if all columns within the index are NULL. However, (as I’ve blogged previously), it’s possible to index all possible NULL values by simply adding a constant value to the index column list. Importantly, the CBO knows when a column has all it’s NULL values indexed and can potentially use the index accordingly.

However, the point of the article is to simple highlight that some constant values are better to use in this scenario than others…

A simple example to illustrate. First, create a table with both the CODE and GRADE columns nullable:

SQL> create table bowie (id number, code number, grade number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,100), mod(rownum,1000), 'DAVID
BOWIE' from dual connect by level 1000000;

999999 rows created.

SQL> insert into bowie values (1000000, null, null, 'ZIGGY STARDUST');

1 row created.

SQL> commit;

Commit complete.

If we create an index on both CODE and GRADE columns:

SQL> create index bowie_code_grade_i on bowie(code, grade);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BOWIE');

PL/SQL procedure successfully completed.

And then run a query looking for any CODE with a NULL value:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |       |    1 |    24 |    1115 (3) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE |    1 |    24 |    1115 (3) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE" IS NULL)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
4030 consistent gets
   0 physical reads
   0 redo size
 784 bytes sent via SQL*Net to client
 624 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

We notice the CBO performs a Full Table Scan even though the CBO knows there’s likely only one row that meets the criteria. Not matter what we do, hint the query, beg, whatever, it’s impossible for the CBO to use the index because the null row is simply not indexed.

Now we come to the rub of the post.

A common recommendation is to simply add a constant to the column list. A constant is always present and associated NULL values are indexed if another index column has a corresponding Non-NULL value. Adding a constant value to the index column list guarantees all NULL values for all index columns must always be present within the index. The CBO recognises this and can therefore potentially use the index to fetch the required NULL values.

However, a common recommendation is also to use a number as the constant. There was a recent tweet I saw a few days ago that had the following example of using the number 1 as the constant value:

SQL> create index bowie_code_grade_i_2 on bowie (code, grade, 1);

Index created.

When we run the query again:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 3086372235

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    24 |       4 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE                |    1 |    24 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_CODE_GRADE_I_2 |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE" IS NULL)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
784 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

The index is indeed used to return the row with the NULL value of interest and only 4 consistent gets are performed.

So what’s the problem?

Nothing, except that perhaps a better constant might have been used, such as say a single space:

SQL> create index bowie_code_grade_i_3 on bowie(code, grade, ' ');

Index created.

If we run the query yet again:

SQL> select * from bowie where code is null;

        ID       CODE      GRADE NAME
---------- ---------- ---------- ------------------------------------------
   1000000                       ZIGGY STARDUST

Execution Plan
----------------------------------------------------------
Plan hash value: 3086372235

------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                 | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                      |    1 |    24 |       4 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE                |    1 |    24 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_CODE_GRADE_I_2 |    1 |       |       3 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE" IS NULL)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
784 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

We get the exact same performance. So, what’s the point?

Well, if we look at the size of the corresponding indexes:

SQL> select index_name, leaf_blocks from user_indexes where table_name='BOWIE';

INDEX_NAME                LEAF_BLOCKS
------------------------- -----------
BOWIE_CODE_GRADE_I               2490
BOWIE_CODE_GRADE_I_2             2908
BOWIE_CODE_GRADE_I_3             2769

We notice the original index has the smallest size as expected, as it doesn’t have to index the constant value. But then we notice that the index with the constant value as the number is somewhat larger than the index with the constant value as a space.

Why?

An index block dump of both indexes will highlight why:

First a partial leaf block dump of index with the “1” as a constant:

Leaf block dump
===============
header address 925073508=0x37238064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 399
kdxcofbo 834=0x342
kdxcofeo 1652=0x674
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 29444101=0x1c14805
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8020] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 05 00 d7
row#1[8004] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 09 00 87
row#2[7988] flag: ——-, lock: 0, len=16
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 2; (2): c1 02
col 3; len 6; (6): 01 c0 80 0d 00 af

Next, a partial leaf block dump of index with the space ” ” as a constant:

Leaf block dump
===============
header address 925073508=0x37238064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 4
kdxcosdc 0
kdxconro 422
kdxcofbo 880=0x370
kdxcofeo 1706=0x6aa
kdxcoavs 826
kdxlespl 0
kdxlende 0
kdxlenxt 29447173=0x1c15405
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8021] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 05 00 d7
row#1[8006] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 09 00 87
row#2[7991] flag: ——-, lock: 0, len=15
col 0; len 1; (1): 80
col 1; len 1; (1): 80
col 2; len 1; (1): 20
col 3; len 6; (6): 01 c0 80 0d 00 af

We notice that the number requires 2 bytes, while the space only requires 1 byte.

So storing the constant as a single byte character, rather a 2 byte number is a free saving, which adds up with each and every index entry, by simply being a little more fastidious when selecting which constant value to use in this scenario.

March 2019 – New Webinar Dates Announced for “Oracle Indexing Internals and Best Practices” November 14, 2018

Posted by Richard Foote in Oracle Index Seminar, Oracle Indexes, Oracle Indexing Internals Webinar, Webinar.
add a comment

OMC Training

I’m very excited to announce two new Webinar events for my acclaimed “Oracle Indexing Internals and Best Practices” training event, running in March 2019 !!

For details of all the extensive content covered in the webinars, please visit my Indexing Seminar page.

The webinars will run for 4 hours each day, spanning a full week period (Monday to Friday) in various timezones that are friendly to different parts of the world.

So that’s 15+ hours of extensive and practical content that will be of benefit to not only DBAs, but also to Developers, Solution Architects and anyone else interested in designing, developing or maintaining high performance Oracle-based applications.

There are currently 2 webinar series scheduled. They are:

  • Webinar Series 1: 4 – 8 March 2019 (start 7pm AEDT, end 11pm AEDT)
  • Webinar Series 2: 26 – 30 March 2019 (start 5am AEDT, end 9am AEDT)

Webinar Series 1 will be in timezones more agreeable to Eastern Asia/Europe. For example, they will start at 1:30pm local time in Mumbai, at 9:00am local time in Paris.

Webinar Series 2 will be in timezones more agreeable to the American Continents. For example they will start at in 1:00pm local time New York and 10:00am local time in San Francisco. (Note: The dates listed are as in Australia, they will actually run between Monday 25 November to Friday 29 March in the Americas).

The cost of each 5 x day series will be $1200.00 Australian Dollars (+GST if applicable and attending from within Australia).

Note: Numbers are strictly limited to ensure the smooth running of these events and enable the opportunity for all attendees to ask questions. One of my previous webinars was officially FULL, so please register early to avoid disappointment as webinars are not scheduled too regularly. 

Booking and Payment Instructions

To book your place, please email me at richard@richardfooteconsulting.comand I will send you an invoice with payment instructions. You can pay either by credit card via PayPal (you do not need a PayPal account for this), via a PayPal account or via direct bank transfer. Note: payment must be received before you can attend the webinar.

You can also pay for these webinars directly here if NOT attending from Australia:

Webinar Series 1: 4-8 March 2019 (start 7pm AEDT, end 11pm AEDT): Buy Now Button

Webinar Series 2: 26-30 March 2019 (start 5am AEDT, end 9am AEDT): Buy Now Button

 

Once registered, you will be sent a unique link for each booking with instructions on how to attend the webinar. Prior to the webinar, you will also be sent a soft copy of the webinar materials, with 850+ pages of amazing content, that includes many useful tips and strategies to maximise the benefits of indexes on application/database performance and scalability.

Up to date details and terms and conditions can be found at my Indexing Webinar web page.

If you have any questions, please don’t hesitate to contact me.

FIRST_ROWS_10 CBO Is Hopeless, It’s Using The Wrong Index !! (Weeping Wall) November 5, 2018

Posted by Richard Foote in ALL_ROWS, CBO, Exadata, FIRST_ROWS_10, Oracle Indexes, Siebel.
6 comments

low

There’s an organisation I had been dealing with on and off over the years who were having all sorts of issues with their Siebel System and who were totally convinced their performance issues were due directly to being forced to use the FIRST_ROWS_10 optimizer. I’ve attempted on a number of occasions to explain that their issues are not actually due to some unexplained deficiency with the FIRST_ROWS_10 CBO, but due to a number of other root issues, sadly to no avail. I recently found out they’re still struggling with performance issues, so I thought it might be worth looking at a classic example of where it looks simplistically like a FIRST_ROWS_10 CBO issue, but the “real” underlying problem(s) are actually quite different. Just in case other sites are likewise struggling to identify such SQL performance issues when using FIRST_ROWS_10…

This is a somewhat simplified version of their most common issue. Firstly, I create a table with 3M rows that has two columns of interest. The CODE column is initially populated with two evenly distributed distinct values and the GRADE column which only has the one distinct value.

SQL> create table bowie (id number not null, code number not null, grade number not null, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,2), 42, 'David Bowie'
from dual connect by level > = 3000000;

3000000 rows created.

SQL> commit;

Commit complete.

I then update a few rows (just 5) so that the CODE column now has a few occurrences of a third distinct value and update 5 other rows so the GRADE column has a few occurrences of a second distinct value:

SQL> update bowie set code=2
where id in (42, 4343, 400042, 1420001, 2000042);

5 rows updated.

SQL> commit;

Commit complete.

SQL> update bowie set grade=2
where id in (4212, 434323, 440423, 1440002, 2400642);

5 rows updated.

SQL> commit;

Commit complete.

We now introduce “a root problem”, not collecting histograms on these two columns, such that the CBO doesn’t recognise that the values in these columns are not evenly distributed. The CBO will incorrectly assume the rare CODE values actually occur 1M times as it will assume even distribution across the three distinct values. Now this is NOT the specific root issue at this organisation as they do gather histograms, but they do have numerous issues with the CBO not picking the correct cardinality/selectivity of their SQL.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We next create indexes on these two CODE and GRADE columns:

SQL> create index bowie_code_i on bowie(code);

Index created.

SQL> create index bowie_grade_i on bowie(grade);

Index created.

Let’s now run the following query using the session default FIRST_ROWS_10 optimizer. The query basically returns just the 5 rows that have a CODE = 2, but sorts the result set by the GRADE column:

SQL> alter session set optimizer_mode=first_rows_10;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 3133133456

---------------------------------------------------------------------------------------------
| Id | Operation                   | Name          | Rows  | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |               |    10 |   240 |       4 (0) | 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | BOWIE         | 1000K |   22M |       4 (0) | 00:00:01 |
|  2 | INDEX FULL SCAN             | BOWIE_GRADE_I |    31 |       |       3 (0) | 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=2)

Statistics
----------------------------------------------------------
    1 recursive calls
    0 db block gets
17518 consistent gets
 5865 physical reads
    0 redo size
  858 bytes sent via SQL*Net to client
  572 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    5 rows processed

The FIRST_ROWS_10 optimizer has come up with a terrible execution plan. Instead of using the index on the CODE column to quickly access the 5 rows of interest and then sort them, it uses an INDEX FULL SCAN via the GRADE column index.

This results in a massively inefficient execution plan (note 17,518 consistent gets), as the CBO has to basically read the entire table via this GRADE index to eventually find the 5 rows of interest that have a CODE=2.

The FIRST_ROWS_10 certainly appears to be dreadful…

But before you go off and demand that Oracle not use this CBO, the key question to ask here is WHY? Why is the FIRST_ROWS_10 CBO deciding to use what is clearly the wrong index?

If we can understand why this is happening, perhaps we can then address what is clearly a problem with an appropriate solution that might not just fix this query but many many like this. And perhaps we can address this problem with an optimal solution and not with a band-aid fix or with a sub-optimal solution that is beneficial for just this one query.

Now there are actually two clues within this execution plan regarding what is really going on.

The first is that the execution plan is estimating that 1000K rows are to be processed by the table access after the filter on CODE=2 has been applied. But this is not correct, there are only 5 such rows.

The second clue that not all is right is that the CBO is estimating 10 rows are to be retrieved via this FIRST_ROWS_10 access plan (as Oracle is trying here to come up with the best plan to retrieve the first 10 rows as efficiently as possible), however there are only 5 rows that meet this SQL criteria. The CBO is not picking up that less than the 10 mandatory rows will actually be fetched and only need to be considered

I always recommend a couple of things to look at if one ever comes across the scenario where the FIRST_ROWS(N) optimizer doesn’t appear to be behaving itself. The first is to look at a 10053 trace and see what the CBO costings are for the various alternative plans. The second is to simply run the query with the ALL_ROWS CBO to see what it’s initial deliberations might be, noting that the CBO has to perform an initial pass with ALL_ROWS to see the data density of the various steps to accurately come up with the optimal FIRST_ROWS(N) costings. Without knowing the potential full result set, The FIRST_ROWS_10 optimizer wouldn’t be able to determine for example how much of a Full Index Scan actually needs to be processed before it likely finds the necessary rows of interest.

So let’s see what costings and plan we get with the ALL_ROWS CBO:

SQL> alter session set optimizer_mode=all_rows;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 2027917145

------------------------------------------------------------------------------------
| Id | Operation         | Name  | Rows  | Bytes |TempSpc | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |       | 1000K |   22M |        |   11173 (8) | 00:00:01 |
|  1 | SORT ORDER BY     |       | 1000K |   22M |    34M |   11173 (8) | 00:00:01 |
|* 2 | TABLE ACCESS FULL | BOWIE | 1000K |   22M |        |   3387 (11) | 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("CODE"=2)

Statistics
----------------------------------------------------------
    0 recursive calls
    0 db block gets
11897 consistent gets
    0 physical reads
    0 redo size
  858 bytes sent via SQL*Net to client
  572 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    1 sorts (memory)
    0 sorts (disk)
    5 rows processed

The root issue now becomes somewhat obvious…

ALL_ROWS is not correctly estimating 5 rows are to be returned, but 1000K rows !! Oracle is not estimating that using the index on the CODE column will only fetch 5 rows, but using such an index would retrieve 1000K rows. Using such a CODE index to access 1M rows would therefore be viewed as being much too expensive.

Importantly, the sort step would therefore not sort 5 rows, but would be required to sort 1000K rows, which would be extremely expensive.

Oracle thinks all this when deciding the best way to access the first 10 rows of interest as efficiently as possible with the FIRST_ROWS_10 CBO.

Rather than using the CODE index to first retrieve all 1000K rows, to then sort all 1000K rows before finally being able to return the first 10 rows of interest, Oracle instead does the following.

It uses the index of the GRADE column to retrieve the first 10 rows of interest. As 1 in 3 of all rows are estimated to be of interest (1M out of the 3M rows, because we’re interested in 1 of the 3 distinct CODE values), it estimates it doesn’t actually have to perform much of the FULL INDEX SCAN to find these initial 10 rows of interest.

As the GRADE index was accessed, it also means these first 10 rows would have been fetched in GRADE order. Therefore, there is no need to perform the SORT BY step as the index guarantees the data to be fetched in GRADE order. Not having to perform this sort makes this plan fantastically cheap compared to any other option that first requires all 1000K  of data to be fetched and sorted.

The execution plan when using ALL_ROWS is therefore deciding to perform a Full Table Scan (FTS) to access efficiently what the CBO thinks will be the 1000K rows of interest. This would be much more efficient than accessing all 1000K of interest via either the CODE index (followed by the sort) or via the GRADE index (in which the sort is not required) but requires all the table to be accessed by the index.

Now for this organisation, this FTS is not an entirely bad thing. Why? Because they run Siebel on an Exadata platform !!

Exadata takes this FTS and performs a Smart Scan. And the associated Storage Index can automatically determine this data is extremely rare and potentially only access the relatively few storage regions within the table where these few values of interest reside.

The query goes from taking 60 seconds to run using the “awful” FIRST_ROWS_10 CBO to just 2 seconds with the “brilliant” ALL_ROWS CBO.

However, the “root issue” here is not the FIRST_ROWS_10 CBO but the fact it is being fed insufficient statistics to make an accurate estimate of the true cost. As with all CBOs, rubbish stats in, rubbish plan out…

If we fix the actual root issue and provide the CBO with the necessary statistics to make the correct cardinality/selectivity estimates (in this example by collecting histograms on the skewed data columns):

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 75');

PL/SQL procedure successfully completed.

And now re-run the query again with ALL_ROWS:

SQL> alter session set optimizer_mode=all_rows;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 2357877461

-----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name         | Rows | Bytes | Cost (%CPU) | Time     |
-----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |              |    5 |   120 |      5 (20) | 00:00:01 |
|  1 | SORT ORDER BY                       |              |    5 |   120 |      5 (20) | 00:00:01 |
|  2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE        |    5 |   120 |       4 (0) | 00:00:01 |
|* 3 | INDEX RANGE SCAN                    | BOWIE_CODE_I |    5 |       |       3 (0) | 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"=2)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  8 consistent gets
  0 physical reads
  0 redo size
858 bytes sent via SQL*Net to client
572 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  1 sorts (memory)
  0 sorts (disk)
  5 rows processed

We notice that the ALL_ROWS CBO is now correctly determining the correct query cardinality (5 rows) and is now using the CODE index to retrieve the correctly estimated 5 rows. It’s happy to now perform the sort as the sort of 5 rows has a trivial cost (the cost just goes up by 1).

If we now run the query using the default session FIRST_ROWS_10 CBO:

SQL> alter session set optimizer_mode=first_rows_10;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 2357877461

-----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name         | Rows | Bytes | Cost (%CPU) | Time     |
-----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |              |    5 |   120 |      5 (20) | 00:00:01 |
|  1 | SORT ORDER BY                       |              |    5 |   120 |      5 (20) | 00:00:01 |
|  2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE        |    5 |   120 |       4 (0) | 00:00:01 |
|* 3 | INDEX RANGE SCAN                    | BOWIE_CODE_I |    5 |       |       3 (0) | 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"=2)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  8 consistent gets
  0 physical reads
  0 redo size
858 bytes sent via SQL*Net to client
572 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  1 sorts (memory)
  0 sorts (disk)
  5 rows processed

We note it’s also using the same execution plan as ALL_ROWS, as the FIRST_ROWS_10 CBO likewise is correctly determining that using the CODE index is now a very efficient manner in which to access just the 5 rows of interest.

Here’s the thing. If you are returning 10 or less rows, the optimal execution plan for both FIRST_ROWS_10 and ALL_ROWS should ultimately be the same, as they both should cost the associated plans the same way.

By correctly identifying and addressing the root issue here (poor cardinality/selectivity estimates), we get the following considerable benefits:

  • We now have an execution plan that doesn’t take 2 seconds to run, but 0.02 of a second (we are now down to just 8 consistent gets). This is much more efficient than the Exadata FTS and allows for the optimal plan to be selected, not just a better plan.
  • We automatically fix ALL execution plans for all queries that are based on this combination of table and filtering columns
  • We correctly understand and identify issues with any other table that likewise has the same costing issue
  • We don’t unnecessarily have to add ALL_ROWS hints or use ALL_ROWS based baselines to address all such related issues
  • We don’t implement a fix (such as baselines) that becomes ineffective if we were to even change the underlying SQL with any subsequent release
  • We don’t attempt to fix the relatively few problem queries with a global change (such as changing to ALL_ROWS CBO) that can potentially impact negatively as many queries as get addressed
  • We don’t spend years demanding futilely that Oracle Support allow Siebel with ALL_ROWS based session settings

So if you’re running Siebel and having performance issues, don’t just assume it’s some deficiency with the FIRST_ROWS_10 CBO, spend the time to get to the bottom of any root issues (e.g. CBO bugs with getting histograms costs incorrect for CHAR columns, missing statistics on small tables, poor default settings when returning empty result sets, Siebel bugs with Cartesian Joins, missing extended statistics, missing indexes, etc. etc.)…

In a future post, I’ll explain why playing around with the unsupported _sort_elimination_cost_ratio parameter (again, always a bad idea when trying to address specific SQL tuning issues) is ultimately futile when trying to get FIRST_ROWS_10 to not use the clearly inefficient index that eliminates the sort…

“Hidden” Efficiencies of Non-Partitioned Indexes on Partitioned Tables Part IV” (Hallo Spaceboy) October 31, 2018

Posted by Richard Foote in Global Indexes, Local Indexes, Oracle Indexes, Partitioned Indexes.
add a comment

Hallo Spaceboy

In Part I, Part II and Part III we looked at some advantages of Global Indexes that may not be obvious to some.

One of the advantages of a Local Index vs. Non-Partitioned Global Index is that a Local Index being a smaller index structures may have a reduced BLEVEL in comparison. This can save a logical read each and every time the index is accessed.

However, if this is a performance concern for usage of a corresponding Global Index, this is a key reason why Global Indexes can likewise be partitioned.

As we saw in the demo in Part III, when the Global Index is used in a query that uses a predicate with the table partitioned key:

SQL> SELECT * FROM big_bowie
WHERE total_sales = 42 and
release_date between '01-JAN-2017' and '31-JUL-2017';

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
   1000041         42         42 20-JUL-17          42

Execution Plan
----------------------------------------------------------
Plan hash value: 1081241859

--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                  | Name                    | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                           |                         |    1 |    25 |      13 (0) | 00:00:01 |       |        |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE               |    1 |    25 |      13 (0) | 00:00:01 |     7 |      7 |
|* 2 | INDEX RANGE SCAN                           | BIG_BOWIE_TOTAL_SALES_I |   10 |       |       3 (0) | 00:00:01 |       |        |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("RELEASE_DATE"=TO_DATE('2017-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
2 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
885 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

The query required 5 consistent gets.

But when the Local Index is used with a reduced BLEVEL:

SQL> SELECT * FROM big_bowie
WHERE total_sales = 42 and
release_date between '01-JAN-2017' and '31-JUL-2017';

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
   1000041         42         42 20-JUL-17          42

Execution Plan
----------------------------------------------------------
Plan hash value: 3499166408

--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                 | Name                          | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                          |                               |    1 |    25 |       2 (0) | 00:00:01 |        |       |
|  1 | PARTITION RANGE SINGLE                    |                               |    1 |    25 |       2 (0) | 00:00:01 |      7 |     7 |
|* 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_BOWIE                     |    1 |    25 |       2 (0) | 00:00:01 |      7 |     7 |
|* 3 | INDEX RANGE SCAN                          | BIG_BOWIE_TOTAL_SALES_LOCAL_I |    1 |       |       1 (0) | 00:00:01 |      7 |     7 |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("RELEASE_DATE"<=TO_DATE(' 2017-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
3 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
885 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

The query used just 4 consistent gets (vs. 5) as a result of the reduction of 1 for the BLEVEL.

So to have the best of both worlds, excellent performance when the query doesn’t contain the table partitioned columns in a predicate and excellent performance to match Local Indexes when the table partitioned key is specified, a Global Index can also be partitioned into many, smaller index structures.

However, unlike a Local Index, a Global Partitioned Index can be partitioned in a manner totally different to that of the table (indeed, the table doesn’t even have to be partitioned).

In this example, the Global Index on the TOTAL_SALES column is partitioned based on TOTAL_SALES (unlike the table which is partitioned based on RELEASE_DATE) and partitioned into 16 partitions (unlike the table which has 8 partitions):

SQL> CREATE INDEX big_bowie_total_sales_global_i ON big_bowie(total_sales)
2 GLOBAL PARTITION BY RANGE (total_sales)
3 (PARTITION P1 VALUES LESS THAN (12501),
4 PARTITION P2 VALUES LESS THAN (25001),
5 PARTITION P3 VALUES LESS THAN (37501),
6 PARTITION P4 VALUES LESS THAN (50001),
7 PARTITION P5 VALUES LESS THAN (62501),
8 PARTITION P6 VALUES LESS THAN (75001),
9 PARTITION P7 VALUES LESS THAN (87501),
10 PARTITION P8 VALUES LESS THAN (100001),
11 PARTITION P9 VALUES LESS THAN (112501),
12 PARTITION P10 VALUES LESS THAN (125001),
13 PARTITION P11 VALUES LESS THAN (137501),
14 PARTITION P12 VALUES LESS THAN (150001),
15 PARTITION P13 VALUES LESS THAN (162501),
16 PARTITION P14 VALUES LESS THAN (175001),
17 PARTITION P15 VALUES LESS THAN (187501),
18 PARTITION P16 VALUES LESS THAN (MAXVALUE)) invisible;

Index created.

SQL> select index_name, partition_name, blevel, leaf_blocks

from dba_ind_partitions where index_name='BIG_BOWIE_TOTAL_SALES_GLOBAL_I';

INDEX_NAME                     PARTITION_NAME           BLEVEL LEAF_BLOCKS
------------------------------ -------------------- ---------- -----------
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P1                            1         335
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P10                           1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P11                           1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P12                           1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P13                           1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P14                           1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P15                           1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P16                           1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P2                            1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P3                            1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P4                            1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P5                            1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P6                            1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P7                            1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P8                            1         349
BIG_BOWIE_TOTAL_SALES_GLOBAL_I P9                            1         349

We notice that each Global Index partition now only has a BLEVEL of 1, the same as the corresponding Local Index.

As such, the performance of the Global Index now matches that of the Local Index when the table partition key is referenced in an SQL predicate:

SQL> alter index BIG_BOWIE_TOTAL_SALES_LOCAL_I invisible;

Index altered.

SQL> alter index BIG_BOWIE_TOTAL_SALES_GLOBAL_I visible;

Index altered.

SQL> SELECT * FROM big_bowie
WHERE total_sales = 42 and release_date
between '01-JAN-2017' and '31-JUL-2017';

         ID  ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
   1000041         42         42 20-JUL-17          42

Execution Plan
----------------------------------------------------------
Plan hash value: 2458305506

----------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                  | Name                           | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
----------------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                           |                                |    1 |    25 |      11 (0) | 00:00:01 |        |       |
|  1 | PARTITION RANGE SINGLE                     |                                |    1 |    25 |      11 (0) | 00:00:01 |      1 |     1 |
|* 2 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE                      |    1 |    25 |      11 (0) | 00:00:01 |      7 |     7 |
|* 3 | INDEX RANGE SCAN                           | BIG_BOWIE_TOTAL_SALES_GLOBAL_I |    1 |       |       1 (0) | 00:00:01 |      1 |     1 |
----------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("RELEASE_DATE"=TO_DATE(' 2017-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
3 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
885 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

 

So Global Indexes can perform optimally, regardless of whether the table partition key is specified in a predicate or not.

The same can’t always be said for a corresponding Local Index.

“Hidden” Efficiencies of Non-Partitioned Indexes on Partitioned Tables Part III” (Ricochet) October 25, 2018

Posted by Richard Foote in Block Dumps, Global Indexes, Local Indexes, Oracle Indexes, Partitioned Indexes, Partitioning.
1 comment so far

ricochet

In Part I and Part II of this series, we looked at how Global Indexes can effectively perform “Partition Pruning” when the partition keys are specified in SQL predicates, by only using those index entries that have a Data Object of interest stored within the index Rowids.

In this piece, I’ll cover the key performance advantage that Global Indexes have over Local Indexes and why I generally recommended Global Indexes from a purely performance perspective.

First, a quick recap of how the Global Index performed. Following is the performance of a query where the table partitioned key is specified in the query:

SQL> SELECT * FROM big_bowie
WHERE total_sales = 42 and
      release_date between '01-JAN-2017' and '31-JUL-2017';

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
   1000041         42         42 20-JUL-17          42

Execution Plan
----------------------------------------------------------
Plan hash value: 1081241859

--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                  | Name                    | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                           |                         |    1 |    25 |      13 (0) | 00:00:01 |        |       |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE               |    1 |    25 |      13 (0) | 00:00:01 |      7 |     7 |
|* 2 | INDEX RANGE SCAN                           | BIG_BOWIE_TOTAL_SALES_I |   10 |       |       3 (0) | 00:00:01 |        |       |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("RELEASE_DATE"=TO_DATE('2017-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
2 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
885 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

As discussed previously, at just 5 consistent gets, it’s very efficient as only the table blocks that reside in possible partitions of interest are only accessed.

The following query selects all TOTAL_SALES values of interest, with no partition key predicate:

SQL> SELECT * FROM big_bowie
WHERE total_sales = 42;

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
    400041         42         42 28-JAN-12          42
   1800041         42         42 28-JAN-12          42
    800041         42         42 03-MAR-13          42
   1200041         42         42 07-APR-14          42
   1600041         42         42 12-MAY-15          42
    200041         42         42 12-MAY-15          42
    600041         42         42 15-JUN-16          42
   1000041         42         42 20-JUL-17          42
        41         42         42 24-AUG-18          42
   1400041         42         42 24-AUG-18          42

10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 1761527485

--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                  | Name                    | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                           |                         |   10 |   250 |      13 (0) | 00:00:01 |        |       |
|  1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE               |   10 |   250 |      13 (0) | 00:00:01 |  ROWID | ROWID |
|* 2 | INDEX RANGE SCAN                           | BIG_BOWIE_TOTAL_SALES_I |   10 |       |       3 (0) | 00:00:01 |        |       |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
  14 consistent gets
   0 physical reads
   0 redo size
1184 bytes sent via SQL*Net to client
 624 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
  10 rows processed

So the index is scanned (4 consistent gets) and 10 consistent gets for the 10 rows accessed (as the clustering here is poor) for a total of 14 consistent gets.

Let’s now compare this to an equivalent Local Index.

SQL> create index big_bowie_total_sales_local_i
on big_bowie(total_sales) local invisible;

Index created.

SQL> alter index big_bowie_total_sales_i invisible;

Index altered.

SQL> alter index big_bowie_total_sales_local_i visible;

Index altered.

If we compare the size characteristics between the two indexes we notice a couple of important differences:

SQL> select index_name, blevel, leaf_blocks from dba_indexes where table_name='BIG_BOWIE';

INDEX_NAME                         BLEVEL LEAF_BLOCKS
------------------------------ ---------- -----------
BIG_BOWIE_TOTAL_SALES_I                 2        5585
BIG_BOWIE_TOTAL_SALES_LOCAL_I           1        4444

SQL> select index_name, partition_name, blevel, leaf_blocks
from dba_ind_partitions where index_name='BIG_BOWIE_TOTAL_SALES_LOCAL_I';

INDEX_NAME                     PARTITION_NAME           BLEVEL LEAF_BLOCKS
------------------------------ -------------------- ---------- -----------
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2011                   1         525
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2012                   1         581
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2013                   1         579
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2014                   1         579
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2015                   1         579
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2016                   1         581
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2017                   1         580
BIG_BOWIE_TOTAL_SALES_LOCAL_I  ALBUMS_2018                   1         440

8 rows selected.

The first difference is that the corresponding Local index segments have a reduced BLEVEL (just 1) when compared to the Global Index (value of 2). A reduction in BLEVEL is quite possible as instead of one “big” index segment, we now have 8 “smaller” index segments.

However, if we look at the overall size of both indexes, we notice that the Local Index (at 4444 leaf blocks) is somewhat smaller than the Global Index (5585 leaf blocks). This is due to the Rowids of Local Indexes not having to be the extended Global Index 10 byte version (which contains the 4 byte Data Object Id), but the standard 6 byte version. Local Indexes can only reference the one table partition and so it’s unnecessary to store the corresponding Data Object Id within the Rowid.

A partial block dump of a Local Index leaf block:
Leaf block dump
===============
header address 924483684=0x371a8064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 483
kdxcofbo 1002=0x3ea
kdxcofeo 1823=0x71f
kdxcoavs 821
kdxlespl 0
kdxlende 0
kdxlenxt 29412237=0x1c0cb8d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 47
col 1; len 6; (6): 01 c0 20 7b 00 a6
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 47
col 1; len 6; (6): 01 c0 22 3a 00 00
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 48
col 1; len 6; (6): 01 c0 20 7b 00 a7

Shows that the Rowids are only 6 bytes.

If we re-run the query that references the partition key in a SQL predicate:

SQL> SELECT * FROM big_bowie
WHERE total_sales = 42 and
      release_date between '01-JAN-2017' and '31-JUL-2017';

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
   1000041         42         42 20-JUL-17          42

Execution Plan
----------------------------------------------------------
Plan hash value: 3499166408

--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                 | Name                          | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                          |                               |    1 |    25 |       2 (0) | 00:00:01 |        |       |
|  1 | PARTITION RANGE SINGLE                    |                               |    1 |    25 |       2 (0) | 00:00:01 |      7 |     7 |
|* 2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_BOWIE                     |    1 |    25 |       2 (0) | 00:00:01 |      7 |     7 |
|* 3 | INDEX RANGE SCAN                          | BIG_BOWIE_TOTAL_SALES_LOCAL_I |    1 |       |       1 (0) | 00:00:01 |      7 |     7 |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("RELEASE_DATE"<=TO_DATE(' 2017-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
3 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  4 consistent gets
  0 physical reads
  0 redo size
885 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

We notice that this is slightly more efficient with only 4 consistent gets, when previously the Global Index required 5 consistent gets. This is directly due to the reduction in the BLEVEL.

So this is a good thing, especially if this query is frequently executed.

If we now run the query without the partition key SQL predicate:

SQL> SELECT * FROM big_bowie WHERE total_sales = 42;

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
    400041         42         42 28-JAN-12          42
   1800041         42         42 28-JAN-12          42
    800041         42         42 03-MAR-13          42
   1200041         42         42 07-APR-14          42
   1600041         42         42 12-MAY-15          42
    200041         42         42 12-MAY-15          42
    600041         42         42 15-JUN-16          42
   1000041         42         42 20-JUL-17          42
        41         42         42 24-AUG-18          42
   1400041         42         42 24-AUG-18          42

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3527547124

--------------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                 | Name                          | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                          |                               |   10 |   250 |      15 (0) | 00:00:01 |        |       |
|  1 | PARTITION RANGE ALL                       |                               |   10 |   250 |      15 (0) | 00:00:01 |      1 |     8 |
|  2 | TABLE ACCESS BY LOCAL INDEX ROWID BATCHED | BIG_BOWIE                     |   10 |   250 |      15 (0) | 00:00:01 |      1 |     8 |
|* 3 | INDEX RANGE SCAN                          | BIG_BOWIE_TOTAL_SALES_LOCAL_I |   10 |       |       9 (0) | 00:00:01 |      1 |     8 |
--------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
  27 consistent gets
   0 physical reads
   0 redo size
1088 bytes sent via SQL*Net to client
 624 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
  10 rows processed

We notice that consistent gets have increased more significantly, up to 27 consistent gets when it was previously 14 consistent gets.

This is because instead of accessing the one Global Index structure, we are now forced to access all 8 Local index structures, as the required TOTAL_SALES value could potentially be found in any of the table partitions. So that’s a minimum of at least 2 consistent gets per Local Index (with an index of BLEVEL 1) that has to accessed even if there are actually no corresponding rows of interest in the particular table partition.

Imagine if this table had a 1000+ table partitions, you can easily see how the cost of using such Local Indexes can quickly become excessive.

So Local Indexes can be very problematic if the partition key is NOT referenced in the SQL or if the range of possible table partitions is excessive. The advantage of a Non-Partitioned index is that there is only the one index structure that need be accessed, regardless of the number of table partitions.

So what if you want to protect yourself from the possible ramifications of the table partition key not being referenced in SQL predicates, but you want to take advantage of the performance benefits of smaller index structures that might have a reduced index BLEVEL?

That’s the topic of Part IV in this series 🙂

“Hidden” Efficiencies of Non-Partitioned Indexes on Partitioned Tables Part II (Aladdin Sane) October 9, 2018

Posted by Richard Foote in Global Indexes, Index Internals, Local Indexes, Oracle Indexes, Partitioned Indexes, Partitioning, ROWID.
2 comments

aladdin sane

In Part I of this series, I highlighted how a Non-Partitioned Global Index on a Partitioned Table is able to effectively perform “Partition Pruning” by reading only the associated index entries to access just the table blocks of interest from relevant table partitions when the table partitioned keys are specified in an SQL Predicate.

Understanding how Oracle achieves this is key (pun fully intended) in understanding the associated advantages of Global Indexes.

Back in time before Oracle introduced Partitioning (pre-Oracle 8 days), the 6 byte ROWID was safely made up of the following components:

  • File Number
  • Block Number
  • Row Number

to uniquely determine the location of any given row.

If we look at a partial block dump of a leaf block from the index based on the Non-Partitioned table:

Leaf block dump
===============
header address 1881436260=0x70247064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 513
kdxcofbo 1062=0x426
kdxcofeo 1880=0x758
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 29387269=0x1c06a05
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 c0 1d 68 00 18
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 c0 24 c8 00 c1
row#2[8000] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 c0 3a 1c 00 96

We notice that the ROWID for each index entry is the standard 6 bytes in size.

With the introduction of Oracle 8 and the Partitioning Option, the File Number was no longer unique, with this number of files (approx. 1K) now possible not for the database at large, but for each Tablespace (thus making Oracle able to cater for very large databases with there now being the option for so many more data files in a database).

This means for a Partitioned Table in which each table partition (or sub-partition) could potentially reside in different tablespaces, the associated file number (RELATIVE_FNO) within the ROWID is no longer unique. Therefore, for Global Indexes in which index entries span across all table partitions, the ROWID is extended to include the 4 byte Data Object Id. A specific object can only live in one tablespace and if Oracle knows the tablespace, Oracle can determine which specific file number the ROWID is referencing. So an extended ROWID is consists of:

  • Data Object Id
  • File Number
  • Block Number
  • Row Number

If we look at a partial block dump of a leaf block from the index based on the Partitioned table:

Leaf block dump
===============
header address 1881436260=0x70247064
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 399
kdxcofbo 834=0x342
kdxcofeo 1652=0x674
kdxcoavs 818
kdxlespl 0
kdxlende 0
kdxlenxt 29385221=0x1c06205
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8020] flag: ——-, lock: 0, len=16
col 0; len 2; (2): c1 02
col 1; len 10; (10): 00 01 57 4a 01 c0 5e cf 00 cc
row#1[8004] flag: ——-, lock: 0, len=16
col 0; len 2; (2): c1 02
col 1; len 10; (10): 00 01 57 4a 01 c0 5f 74 00 e7
row#2[7988] flag: ——-, lock: 0, len=16
col 0; len 2; (2): c1 02
col 1; len 10; (10): 00 01 57 4b 01 c0 5c 32 00 c9

We notice that the ROWID for each index entry is now the extended 10 bytes in size as it includes the Data Object Id.

Storing the Data Object Id as part of the ROWID has various advantages, such as being able to asynchronously maintain index entries following table partition operations such as dropping a table partition (as discussed previously here).

However the key advantage of storing the Data Object Id as part of the ROWID is that this enables Oracle when using Global Indexes to automatically perform “Partition Pruning” (the ability to access only those partitions that can possibly contain data of interest), when the table partition key is specified in an SQL predicate.

When the table partition key is specified in an SQL predicate, Oracle can determine which table partitions can only contain such data and then only access the table blocks via the index ROWIDs that have corresponding Data Object Ids of interest. This is how in the example in Part I Oracle was able to only access just the table block that belongs in the table partition of interest, effectively performing predicate filtering at the index level, without unnecessarily having to access the table blocks at all from partitions that are not of interest.

This enables Global Indexes to have almost Local Index like performance in scenarios where the table partition key is specified in SQL predicates. Local Indexes do have the advantage of potentially having a reduced BLEVEL in that if you have say 100 table partitions, each Local Index would only have to be approx. 1/100 the size of the single, Non-Partitioned Index (although Global Indexes can in turn be partitioned if individual index size were problematic, even if the table were not partitioned). Additionally, Local Indexes don’t have to concern themselves with having to read through unnecessary index entries if index entries associated with a specific subset of table partitions were only of interest.

However, Global Indexes have a key performance advantage over Local Indexes which I’ll discussed in Part III.

 

“Hidden” Efficiencies of Non-Partitioned Indexes on Partitioned Tables Part I (The Jean Genie) October 4, 2018

Posted by Richard Foote in Global Indexes, Local Indexes, Non-Partitioned Indexes, Oracle Indexes, Partitioned Indexes, Partitioning.
5 comments

jean genie

When it comes to indexing a partitioned table, many automatically opt for Local Indexes, as it’s often assumed they’re simply easier to manage and more efficient than a corresponding Global Index.

Having smaller index structures that are aligned to how the table is partitioned certainly has various advantages. The focus in this little series is on the “more efficient” claim.

A key point that many miss is that a Non-Partitioned Index on a Non-Partitioned table is not exactly the same beast as a Non-Partitioned Index on a Partitioned Table. The purpose of this initial post is to illustrate this difference.

Let’s begin by creating a Non-Partitioned table that has a number of years worth of data:

SQL> CREATE TABLE big_ziggy (id number, album_id number, country_id number, release_date date, total_sales number);

Table created.

SQL> INSERT INTO big_ziggy
     SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800),
            mod(rownum,200000)+1
     FROM dual CONNECT BY LEVEL  2000000;

2000000 rows created.

SQL> commit;
      
Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_ZIGGY');

PL/SQL procedure successfully completed.

So we have a 2M row table with about 8 years worth of data (based on the RELEASE_DATE column) and a TOTAL_SALES column that has some 200,000 distinct columns throughout this period.

Let’s next create a standard Non-Partitioned index based on the TOTAL_SALES column:

SQL> create index big_ziggy_total_sales_i on big_ziggy(total_sales);

Index created.

If we now run a query to access the 10 rows with a value equal to 42:

SQL> SELECT * FROM big_ziggy WHERE total_sales = 42;

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
   1400041         42         42 24-AUG-18          42
    400041         42         42 28-JAN-12          42
   1000041         42         42 20-JUL-17          42
   1800041         42         42 28-JAN-12          42
    600041         42         42 15-JUN-16          42
    800041         42         42 03-MAR-13          42
   1200041         42         42 07-APR-14          42
   1600041         42         42 12-MAY-15          42
    200041         42         42 12-MAY-15          42
        41         42         42 24-AUG-18          42

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1252095634

---------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                    | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                         |   10 |   250 |      13 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BIG_ZIGGY               |   10 |   250 |      13 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BIG_ZIGGY_TOTAL_SALES_I |   10 |       |       3 (0) | 00:00:01 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
  14 consistent gets
   0 physical reads
   0 redo size
1184 bytes sent via SQL*Net to client
 624 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
  10 rows processed

We notice we need 14 consistent gets to access these 10 rows, 4 gets for index block accesses and 10 gets to access the relevant rows from the table blocks (as we have a terrible clustering due to the relevant data being distributed throughout the table).

If we run a query where we’re only interested in accessing data only within a specific year:

SQL> SELECT * FROM big_ziggy WHERE total_sales = 42 and release_date
between '01-JAN-2017' and '31-JUL-2017';

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
   1000041         42         42 20-JUL-17          42

Execution Plan
----------------------------------------------------------
Plan hash value: 1252095634

---------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                    | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                         |    1 |    25 |      13 (0) | 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID BATCHED | BIG_ZIGGY               |    1 |    25 |      13 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BIG_ZIGGY_TOTAL_SALES_I |   10 |       |       3 (0) | 00:00:01 |
---------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("RELEASE_DATE">=TO_DATE(' 2017-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND "RELEASE_DATE"<=TO_DATE(' 2017-07-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

2 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
 14 consistent gets
  0 physical reads
  0 redo size
885 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

Even though we now only return the one row,  notice we still have to perform the same 14 consistent gets. That’s because the RELEASE_DATE column is NOT part of the index, so we still need to fetch all 10 matching rows with TOTAL_SALES=42 and then filter out those that don’t have a RELEASE_DATE of interest. The note above in the predicate information shows we now have this additional filtering taking place.

Let’s run the same queries on a table with identical data, but this time on a table that is partitioned based on the RELEASE_DATE column, with a partition for each years worth of data:

SQL> CREATE TABLE big_bowie(id number, album_id number, country_id number, release_date date, total_sales number)
2 PARTITION BY RANGE (release_date)
3 (PARTITION ALBUMS_2011 VALUES LESS THAN (TO_DATE('01-JAN-2012', 'DD-MON-YYYY')),
4 PARTITION ALBUMS_2012 VALUES LESS THAN (TO_DATE('01-JAN-2013', 'DD-MON-YYYY')),
5 PARTITION ALBUMS_2013 VALUES LESS THAN (TO_DATE('01-JAN-2014', 'DD-MON-YYYY')),
6 PARTITION ALBUMS_2014 VALUES LESS THAN (TO_DATE('01-JAN-2015', 'DD-MON-YYYY')),
7 PARTITION ALBUMS_2015 VALUES LESS THAN (TO_DATE('01-JAN-2016', 'DD-MON-YYYY')),
8 PARTITION ALBUMS_2016 VALUES LESS THAN (TO_DATE('01-JAN-2017', 'DD-MON-YYYY')),
9 PARTITION ALBUMS_2017 VALUES LESS THAN (TO_DATE('01-JAN-2018', 'DD-MON-YYYY')),
10 PARTITION ALBUMS_2018 VALUES LESS THAN (MAXVALUE));

Table created.

SQL> INSERT INTO big_bowie
     SELECT rownum, mod(rownum,5000)+1, mod(rownum,100)+1, sysdate-mod(rownum,2800),
            mod(rownum,200000)+1
     FROM dual CONNECT BY LEVEL  2000000;

2000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=> null, tabname=> 'BIG_BOWIE');

PL/SQL procedure successfully completed.

Again, we create a standard, Non-Partitioned Index:

SQL> create index big_bowie_total_sales_i on big_bowie(total_sales);

Index created.

If we now run the equivalent of the first query:

SQL> SELECT * FROM big_bowie WHERE total_sales = 42;

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
    400041         42         42 28-JAN-12          42
   1800041         42         42 28-JAN-12          42
    800041         42         42 03-MAR-13          42
   1200041         42         42 07-APR-14          42
   1600041         42         42 12-MAY-15          42
    200041         42         42 12-MAY-15          42
    600041         42         42 15-JUN-16          42
   1000041         42         42 20-JUL-17          42
        41         42         42 24-AUG-18          42
   1400041         42         42 24-AUG-18          42

10 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1761527485

--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                  | Name                    | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                           |                         |   10 |   250 |      13 (0) | 00:00:01 |        |       |
|  1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE               |   10 |   250 |      13 (0) | 00:00:01 |  ROWID | ROWID |
|* 2 | INDEX RANGE SCAN                           | BIG_BOWIE_TOTAL_SALES_I |   10 |       |       3 (0) | 00:00:01 |        |       |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
  14 consistent gets
   0 physical reads
   0 redo size
1184 bytes sent via SQL*Net to client
 624 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
  10 rows processed

We get the exact same performance, with the same 14 consistent gets necessary to access the 10 rows of interest.

If we now run the equivalent of the second query:

SQL> SELECT * FROM big_bowie WHERE total_sales = 42 and release_date between '01-JAN-2017' and '31-JUL-2017';

        ID   ALBUM_ID COUNTRY_ID RELEASE_D TOTAL_SALES
---------- ---------- ---------- --------- -----------
   1000041         42         42 20-JUL-17          42

Execution Plan
----------------------------------------------------------
Plan hash value: 1081241859

--------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation                                  | Name                    | Rows | Bytes | Cost (%CPU) | Time     | Pstart | Pstop |
--------------------------------------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                           |                         |    1 |    25 |      13 (0) | 00:00:01 |        |       |
|* 1 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED | BIG_BOWIE               |    1 |    25 |      13 (0) | 00:00:01 |      7 |     7 |
|* 2 | INDEX RANGE SCAN                           | BIG_BOWIE_TOTAL_SALES_I |   10 |       |       3 (0) | 00:00:01 |        |       |
--------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("RELEASE_DATE"=TO_DATE(' 2017-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

2 - access("TOTAL_SALES"=42)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  5 consistent gets
  0 physical reads
  0 redo size
885 bytes sent via SQL*Net to client
624 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  1 rows processed

We notice a key difference. Even though it’s equivalent to the same index as in the previous Non-Partitioned table and even though the index only contains just the TOTAL_SALES column, the number of consistent gets has dropped from 14 to just 5 consistent gets.

In this example, Oracle has clearly not had to fetch the rows from the table that do not match the RELEASE_DATE of interest. Even though the predicate information is listing the requirement for filtering to take place, this filtering has clearly been performed within the index, without having to actually fetch any of the rows that aren’t of interest.

The index is able to only access the row(s) of interest from the Partitioned Table…

This is the little “hidden efficiency” of Global Indexes on Partitioned Tables, which is what we effectively have here.

In Part II, I’ll discuss how Oracle does this additional filtering within the index and why understanding this is important in deciding which type of index to deploy, as from a “performance” perspective, Global Indexes are often the preferred option.

New Zealand: “Oracle Indexing Internals and Best Practices” Seminars November 2018. August 24, 2018

Posted by Richard Foote in Oracle Index Seminar, Oracle Indexes.
add a comment

seminar photo

Good news for those of you in beautiful New Zealand.

Due to popular demand, I’ll be returning to run some of my acclaimed “Oracle Indexing Internals and Best Practices” seminars in November 2018.

The dates and events are:

Wellington: 19-20 November 2018: Registration Here or Buy Directly Here Buy Now Button

Auckland: 21-22 November 2018: Registration Here or Buy Directly Here Buy Now Button

These seminars have been very successful and highly acclaimed. Remarkably, every attendee so far in 2018 has rated the seminar a 5 out of 5 training experience !!

Places are strictly limited, so don’t leave it too late to register.

For all the details regarding seminar content, visit my Indexing Seminar page.

As always, if you have any questions, don’t hesitate to contact me at richard@richardfooteconsulting.com.

“Let’s Talk Database” coming to Wellington, Auckland, Brisbane and Perth in September. August 17, 2018

Posted by Richard Foote in Let's Talk Database, Oracle Indexes.
add a comment

seminar photo

Due to popular demand, I’ve been asked by Oracle to again run some more “Let’s Talk Database” events in September. Dates and venues are as follows:

Monday, 3 September – Wellington (Wellington Oracle Office): Registration Link.

Tuesday, 4 September – Auckland (Auckland Oracle Office): Registration Link.

Wednesday, 12 September – Brisbane (Brisbane Oracle Office): Registration Link.

Thursday, 13 September – Perth (Perth Oracle Office): Registration Link.

 

Agenda:

8:30 – 9:00am – Registration and coffee

9:00 – 10:30am – Importance of Data Clustering

10:30 – 11:00am – Break

11:00 – 12:30pm – Oracle Database 18c – New Features

12:30 – 1:30pm – Lunch, Networking and Informal Q&A

 

Data Clustering: A Key To Developing High Performance & Scalable Apps”

Today’s agile applications have to deal with ever increasing data volumes; rich varieties of data types with their associated intricate/flexibility requirements; and complex hybrid cloud-based environments, where critical high volume transactional-based applications have to function in combination with equally important real-time advanced data analytics reporting solutions. As such, having an innovative data clustering strategy in combination with appropriate data-aware deployments is vital to ensure today’s complex applications are high-performing, scalable, and robust. Many of today’s applications struggle to perform or scale because they lack the necessary flexible indexing and data management strategies at the database layer. This session will demonstrate various innovative data clustering and indexing-based tricks and tactics that will ensure applications run as efficiently as possible, regardless of the size or complexity of the underlying data management layer.

“Oracle Database 18c New Features”

This session will look at some of the key new features and capabilities introduced in Oracle Database 18c. New features discussed include Memory Optimized Row Store for OLTP workloads, Database In-Memory for External Tables, Inline External Tables, In-Memory Database improvements, Zero Impact Grid Infrastructure Patching, Alter Partitioned Table Merge Online, Alter Table Modify Partitioned Table to Partitioned Table, Approximate Query improvements, Private Temporary Tables and Polymorphic Table Functions. The session will also discuss how to play with some of these new features now without the need for an Oracle Cloud account.”

FAQ: Webinars for “Oracle Indexing Internals and Best Practices” July 30, 2018

Posted by Richard Foote in Indexing Webinar, Oracle Indexes.
2 comments

omc-training-e1532332613780.jpg

I’ve been somewhat inundated with questions regarding the “Oracle Indexing Internals and Best Practices” webinar series I’ll be running in October and November since I announced both webinar series last week. So I’ve compiled the following list of frequently asked questions which I’m hoping will address most of those asked.

If you have any additional questions or you’re interested in attending either of these webinar series, please contact me at richard@richardfooteconsulting.com.

 

FAQ: Webinars for “Oracle Indexing Internals and Best Practices”.

 

How do I register? Please contact me at richard@richardfooteconsulting.com and I will give you all the necessary registration and payment instructions.

How much does it cost? To attend the full 5 day x 4 hours webinar series costs $1200.00 Australian Dollars. If you’re from Australia, the full cost will be $1320.00 which includes the 10% GST.

What are the payment options? You can pay either by direct bank transfer (I will provide you with all the necessary banking information) or by Credit Card (I will send you a PayPal invoice which allows for Credit Card payments. You do not need a PayPal account for this service).

Are there group discounts? Yes, if you have 3 or more persons from your organisation that is interested in attending a webinar, please contact me for group discount rates.

If my payment hasn’t come through yet, can I still join the webinar? No. For obvious reasons, I must receive full payment before you can be registered for the webinar. I try to provide plenty of prior warning before the webinars to enable obtaining managerial permission and completing payments. Please don’t leave it too late.

Can I get a refund? No. However, if you’re not able to join for some reason, you can either transfer the registration to someone else within your organisation, or you can attend a later webinar at not addition cost. Please contact me ASAP if you’re not able to attend.

Do you run customer dedicated webinars? Yes, if you have 10 or more persons in your organisation interested in attending a webinar, contact me to determine if a webinar just for your organisation might be the better option.

What are the start and end times for each webinar? They are detailed within the webinar description. All times listed are in local Australian time as I’m based in Australia. You will need to convert these times to determine your corresponding local times. Webinars are scheduled to be more favourable in different parts of the world so select the webinar series that best suits you. Note: because of time differences, the webinar could be run in a different day to those listed in your part of the world. For example, the webinar scheduled for 6-10 November 2018 starts on Tuesday, 6 November at 5am in Australia, but this is actually Monday, 5 November at 10am in San Francisco.

What is the schedule for each webinar series? Each webinar series runs for 5 days (Monday-Friday at the targeted time zone), with 4 hours of activity per day. Each day, there are 3 x 1 hour sessions, with a 20 minute break following each one. During the 20 minute break, there is an opportunity for further questions to be asked.

What webinars are currently scheduled?

  • Webinar Series 1: 8-12 October 2018 (start 7pm AEDT, end 11pm AEDT)
  • Webinar Series 2: 6-10 November 2018 (start 5am AEDT, end 9am AEDT)

Series 1 is in a time zone more suitable for those in Eastern Asia and Europe. Series 2 is more suitable for those in the Americas.

Is the material covered the same as the in person seminars? Yes, the same material is covered. There is however a little more time available during a webinar, so I might be able to cover more topics. Maybe. To see the webinar content, visit my Indexing Seminar page.

Will I be able to ask questions during the webinar? Yes. However, I will mute all connections during the webinar, with questions during the sessions possible via a texting interface. This will reduce noise issues and unnecessary questions from disrupting the presentations. I will try to answer appropriate questions during the presentations or during the 20 minute break following each presentation session. You can also ask me follow up questions after the webinar.

Do you get any webinar materials to keep? Yes. Each attendee will get a softcopy of the some 850+ pages of content in PDF format. This will allow everyone to review all the material after the webinar and to also cut ‘n’ paste the numerous demos I cover and re-run them in your own environments. Note this material is of course copyrighted and that replicating or distributing this material is strictly prohibited.

What webinar software is used? I will be using GoToWebinar.

Are the webinars recorded? No, not initially anyways. Remember, you do have a copy of all the webinar content covered in the supplied PDF.

How do I join the webinar? I will provide full instructions on how to join the webinar beforehand. You will be given a unique link that allows you to join in the fun. Do not distribute this link to anyone else as only one connection is possible per supplied link. I recommend downloading the GoToWebinar interface (I will provide the instructions) in plenty of time prior to joining in the webinar.

When should I login to the webinar? I will start the webinar session 30 minutes prior to the official start to give everybody plenty of time to logon and iron out any issues you might have.

Are there any exercises or hands on labs? No. There are parts where I get customer participation, but there are no labs or requirements to have a database environment on hand. That said, some people do like to periodically cut ‘n’ paste some of my demos from the supplied PDFs and run them in their own database environments during the class. However, the pace I set makes this difficult to do while keeping up with the presentation.

Do I get a certificate for attending? Yes. Everyone will get a certificate of attendance.

Is there a limit to the number of attendees per webinar? Yes. To keep the webinar running smoothly and to give everyone an opportunity to ask some questions, numbers will be strictly limited.

How often are these webinars scheduled? Unsure. I will try and run a webinar series in different time zone friendly every “few months”, but this will depend largely on demand and my availability.

Where can I ask additional questions? As always, please contact me at richard@richardfooteconsulting.com if you have any questions.

Is the “Oracle Indexing Internals and Best Practices” learning experience really as good as people say? Yes, it really is 🙂

 

For up to date details, please visit my Indexing Webinar page.

If you have any questions, please don’t hesitate to contact me at richard@richardfooteconsulting.com.

Announcement: Webinars for “Oracle Indexing Internals and Best Practices” Now Confirmed !! July 23, 2018

Posted by Richard Foote in Oracle Indexes.
add a comment

OMC Training

Exciting News !! I can now confirm the dates for my first webinars of my fully updated and highly acclaimed “Oracle Indexing Internals and Best Practice” seminar. For details of all the extensive content covered in the webinars, please visit my Indexing Seminar page.

The webinars will run for 4 hours each day, spanning a full week period (Monday to Friday) in various timezones that are friendly to different parts of the world.

So that’s 15+ hours of extensive content that will be of benefit to not only DBAs, but also to Developers, Solution Architects and anyone else interested in designing, developing or maintaining high performance Oracle-based applications.

There are currently 2 webinar series scheduled. They are:

  • Webinar Series 1: 8-12 October 2018 (start 7pm AEDT, end 11pm AEDT)
  • Webinar Series 2: 6-10 November 2018 (start 5am AEDT, end 9am AEDT)

Webinar Series 1 will be in timezones more agreeable to Eastern Asia/Europe. For example, they will start at 1:30pm local time in Mumbai, at 10:00am local time in Paris.

Webinar Series 2 will be in timezones more agreeable to the American Continents. For example they will start at in 1:00pm local time New York and 10:00am local time in San Francisco. (Note: The dates listed are as in Australia, they will actually run between Monday 5 November to Friday 9 November in the Americas).

The cost of each 5 x day series will be $1200.00 Australian Dollars (+GST if applicable and attending from within Australia).

Note: As this will be the first run of these webinars, numbers are strictly limited to ensure the smooth running of these events. Please register early to avoid disappointment as webinars are not scheduled too regularly.

Booking and Payment Instructions

To book your place, please email me at richard@richardfooteconsulting.com and I will send you an invoice with payment instructions. You can pay either by credit card via PayPal (you do not need a PayPal account for this), via a PayPal account or via direct bank transfer. Note: payment must be received before you can attend the webinar.

You can also pay for these webinars directly here if NOT attending from Australia:

Webinar Series 1: 8-12 October 2018  Buy Now Button

Webinar Series 2: 6-10 November 2018 Buy Now Button

Once registered, you will be sent a unique link for each booking with instructions on how to attend the webinar. Prior to the webinar, you will also be sent a soft copy of the webinar materials, with 800+ pages of amazing content, that includes many useful tips and strategies to maximise the benefits of indexes on application/database performance and scalability.

Up to date details and terms and conditions can be found at my Indexing Webinar web page.

If you have any questions, please don’t hesitate to contact me.

Announcement: Venue Confirmed For Upcoming Brussels “Oracle Indexing Internals and Best Practices” Seminar July 18, 2018

Posted by Richard Foote in Oracle Indexes.
add a comment

Richard Let's Talk Database Nov 2015

I can finally confirm the venue for my upcoming “Oracle Indexing Internals and Best Practices” seminar in beautiful Brussels, Belgium running on 27-28 September 2018.

The venue will be the Regus Brussels City Centre Training Rooms Facility, Avenue Louise / Louizalaan 65, Stephanie Square, 1050, Brussels.

Note: This will be the last public seminar I’ll run in Europe this year, so with strictly limited places, available places are likely to go fast.

This is a must attend seminar of benefit to not only DBAs, but also to Developers, Solution Architects and anyone else interested in designing, developing or maintaining high performance Oracle-based applications. It’s a fun, but intense, content rich seminar that is suitable for people of all experiences (from beginners to seasoned Oracle experts).

You can book your place now at: https://www.eventbrite.com.au/e/oracle-indexing-internals-best-practices-seminar-with-richard-foote-brussels-tickets-47703933750

The early bird rate is available until 7 September 2018, but it’s likely that places will all be sold out before then.

If you have any questions, or if you wish to pay directly via a raised invoice, then please contact me directly at richard@richardfooteconsulting.com ASAP.

Rebuilding Indexes: Danger With Clustering Factor Calculation (Chilly Down) July 17, 2018

Posted by Richard Foote in CBO, Clustering Factor, Data Clustering, Index Rebuild, Oracle Indexes, TABLE_CACHED_BLOCKS.
add a comment

chilly down

Let me start by saying if you don’t already following Jonathan Lewis’s excellent Oracle blog, do yourself a favour. In a recent article, Jonathan highlighted a danger with rebuilding indexes (or indeed creating an index) when used in relation to collecting index statistics with the TABLE_CACHED_BLOCKS preference.

I’ve discussed the importance of the TABLE_CACHED_BLOCKS statistics collection preference a number of times previously, but the issue discussed by Jonathan is worth repeating here.

Let me start by repeating a demo I’ve used previously, by creating a table stored in an ASSM tablespace with data that is well clustered, but reported as being badly clustered due to how the Clustering Factor (CF) is calculated by default.

Firstly, I create a simple table and sequence and run a procedure that populates the table with a monotonically increasing ID column  populated via the sequence. But importantly, the procedure is executed concurrently from 3 separate sessions such that the monotonically increasing ID values are not stored in the table in precisely ID order as each of the 3 sessions inserts rows into different sets of table blocks:

SQL> create table bowie_assm (id number, name varchar2(42));

Table created.

SQL> create sequence bowie_assm_seq order;

Sequence created.

SQL> create or replace procedure pop_bowie_assm as
2 begin
3 for i in 1..100000 loop
4 insert into bowie_assm values (bowie_assm_seq.nextval, 'DAVID BOWIE');
5 commit;
6 end loop;
7 end;
8 /

Procedure created.

The following is executed concurrently in 3 different sessions:

SQL> exec pop_bowie_assm

PL/SQL procedure successfully completed.

If you can imagine 3 different blocks within the table, block one has rows with ID values 1,4,7,10,13,16…, block two has rows with ID values 2,5,8,11,14,17… and block three has rows with ID values 3,6,9,12,15,18…

So the data is well clustered in that the data for a large number of consecutive IDs are stored within a few blocks, but they’re not stored precisely in ID order within the table.

If we now create an index on the ID column and look at the Clustering Factor (CF) of the index:

SQL> create index bowie_assm_id_i on bowie_assm(id);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE_ASSM');

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

We note the calculated CF is extremely poor at 219416 (a value much closer to the number of index entries than the number of blocks in the table) as the default calculation notes that most index entries have a rowid that points to a different table block to the previous index entry rowid.

If we run a query that only requires a moderate number of rows (approx. 0.13% of the table) to be returned:

SQL> select * from bowie_assm where id between 42 and 429;

388 rows selected.

Execution Plan
--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows | Bytes | Cost (%CPU) |  Time     |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |            |  389 |  6613 |    282 (11) |  00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE_ASSM |  389 |  6613 |    282 (11) |  00:00:01 |
--------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
 974 consistent gets
   0 physical reads
   0 redo size
8869 bytes sent via SQL*Net to client
 883 bytes received via SQL*Net from client
  27 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
 388 rows processed

We note the CBO decides to use a Full Table Scan (FTS) as the index is too costly and inefficient to use with such a poor CF value.

However, if say retrieving 100 rows, the CBO thinks it needs to visit many more table blocks than the 3 blocks that in actual fact contain the 100 rows of interest.

The TABLE_CACHED_BLOCKS statistics preference allows us to modify how the CF is calculated by not incrementing the CF value if an index rowid points to a block that was visited just TABLE_CACHED_BLOCKS ago.

If we now re-calculate the CF but with the TABLE_CACHED_BLOCKS preference set to say 42:

SQL> exec dbms_stats.set_table_prefs(ownname=>user, tabname=>'BOWIE_ASSM', pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ASSM_ID_I',estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000               909

We notice the CF has dropped significantly, down to just 909 from its previous 219416 value.

If we now re-run the same query as before:

SQL> select * from bowie_assm where id between 42 and 429;

388 rows selected.

Execution Plan
-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes | Cost (%CPU) | Time      |
-------------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |                 |  389 |  6613 |       4 (0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE_ASSM      |  389 |  6613 |       4 (0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_ASSM_ID_I |  389 |       |       2 (0) |  00:00:01 |
-------------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
   0  recursive calls
   0  db block gets
   6  consistent gets
   0  physical reads
   0  redo size
8734  bytes sent via SQL*Net to client
 608  bytes received via SQL*Net from client
   2  SQL*Net roundtrips to/from client
   0  sorts (memory)
   0  sorts (disk)
 388  rows processed

We notice the CBO now automatically decides to use the index and more importantly, that at just 6 consistent gets, the query is now much more efficient as a result.

The index was always the more efficient access method, but because of the poor CF that was previously calculated, the CBO got it wrong. Now that a more “accurate” CF is calculated, all is now well.

However, if we now decide to rebuild this index:

alter index bowie_assm_id_i rebuild;

Index altered.

SQL> select * from bowie_assm where id between 42 and 429;

388 rows selected.

Execution Plan
--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows | Bytes | Cost (%CPU) | Time      |
--------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |            |  389 |  6613 |    282 (11) |  00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE_ASSM |  389 |  6613 |    282 (11) |  00:00:01 |
--------------------------------------------------------------------------------

Statistics
----------------------------------------------------------
   3 recursive calls
   0 db block gets
 956 consistent gets
   0 physical reads
   0 redo size
4094 bytes sent via SQL*Net to client
 608 bytes received via SQL*Net from client
   2 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
 388 rows processed

So we’re back to the less efficient FTS. Why ? A look at the CF reveals the problem:

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

When the index is rebuilt and so when the index statistics are implicitly recalculated, the TABLE_CACHED_BLOCKS preference is ignored. This applies even if this preference is set at the schema or database level:

SQL> exec dbms_stats.set_schema_prefs(ownname=>user, pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.set_database_prefs(pname=>'TABLE_CACHED_BLOCKS', pvalue=>42);

PL/SQL procedure successfully completed.

SQL> alter index bowie_assm_id_i rebuild online;

Index altered.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

This issue also applies when an index is newly created, any TABLE_CACHED_BLOCKS setting is ignored, until the time when statistics are again collected via DBMS_STATS:

SQL> drop index bowie_assm_id_i;

Index dropped.

SQL> create index bowie_assm_id_i on bowie_assm(id);

Index created.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000            219416

SQL> exec dbms_stats.gather_index_stats(ownname=>user, indname=>'BOWIE_ASSM_ID_I',estimate_percent=> null);

PL/SQL procedure successfully completed.

SQL> SELECT t.table_name, i.index_name, t.blocks, t.num_rows, i.clustering_factor
2 FROM user_tables t, user_indexes i
3 WHERE t.table_name = i.table_name AND i.index_name='BOWIE_ASSM_ID_I';

TABLE_NAME      INDEX_NAME               BLOCKS   NUM_ROWS CLUSTERING_FACTOR
--------------- -------------------- ---------- ---------- -----------------
BOWIE_ASSM      BOWIE_ASSM_ID_I            1000     300000               909

This is currently being investigation by Oracle as unpublished bug 28292026.

Again, another example of the dangers of blindly rebuilding indexes without a valid justification…

Index Column Order – Impact On Index Branch Blocks Part II (The Weeping Song) July 5, 2018

Posted by Richard Foote in Index Column Order, Index Compression, Oracle Indexes.
add a comment

weeping song

In Part I, I discussed how the order of columns in an index makes no real difference to the effectiveness of the index if all columns are referenced via equality predicates.

If the leading column has a high number of distinct columns, it might result in less necessary data within index branches as less data is required to determine the unique path down to the leaf block containing the first index entry of interest. This might save a moderate number of index branch blocks. The number of branch blocks though has a trivial impact on index performance, if as in the vast majority of cases, the index height remains the same.

However, if one can potentially significantly reduce the number of required leaf blocks within an index, this might not only also significantly reduce the number of associated index branch blocks, but obviously the overall size of the index. This is possible with Basic Index Compression, but such compression is only possible if the leading column(s) has relatively few distinct values.

So going back to the demo in Part I, when the index was created with the ID column leading (which had many distinct values):

SQL> create index ziggy_id_code_i ON ziggy(id, code);

Index created.

SQL> analyze index ziggy_id_code_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14135         23      176612   113264736  101146313

We note the size of the index, with 14135 leaf blocks and 23 branch blocks.

If we now attempt to compress this index with basic index compression:

SQL> alter index ziggy_id_code_i rebuild compress;

Index altered.

SQL> analyze index ziggy_id_code_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      15795         26      197435   126505652  113167136

We notice basic index compression has been totally ineffective. In fact, the index has increased in size with there now being 15795 leaf blocks and 26 branch blocks. The number of compressed index columns makes no difference, as it’s the leading column with high distinct values that is the problem here.

That’s because the de-duplication at the leaf block level necessary for effective basic index compression is impossible with the ID column leading as there are little to no replicated column values. Basic index compression must have high numbers of replicated column values in at least the leading column(s) to be effective.

If we look at the index with the replicated CODE column as the leading column:

SQL> create index ziggy_code_id_i on ziggy(code,id);

Index created.

SQL> analyze index ziggy_code_id_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3      14125         83      656341   113666656  101626042

We notice although the number of leaf blocks are similar to the previous non-compressed index at 14125 leaf blocks, at 83 there are more branch blocks (previous index had just 23). As discussed in Part I, this is because the relatively large sized CODE column must be stored in the branch blocks.

However, this index is compressible with the leading CODE column having duplicate values. Therefore, if we compress the index by compressing just the CODE column:

SQL> alter index ziggy_code_id_i rebuild compress 1;

Index altered.

SQL> analyze index ziggy_code_id_i validate structure;

Index analyzed.

SQL> select height, lf_blks, br_blks, br_rows_len, btree_space, used_space from index_stats;

    HEIGHT    LF_BLKS    BR_BLKS BR_ROWS_LEN BTREE_SPACE USED_SPACE
---------- ---------- ---------- ----------- ----------- ----------
         3       4620         28      214696    37166416   33369357

We notice not only has the number of branch blocks reduced (28 down from 83), but more importantly, we have significantly reduced the number of overall leaf blocks (4620 down from 14125).

So if reducing the size of the resultant index is the aim, you will generally get a much better result by using basic index compression and ensuring the columns with the few distinct values are the leading columns, than by potentially moderately reducing branch blocks with the leading column more distinct.

The other advantage to placing the columns with fewer distinct values as the leading columns of an index is that it makes an Index Skip Scan a viable execution path if the leading column(s) is not referenced in a predicate. This is not possible if the leading column is too distinct. I’ve discussed Index Skip Scans previously in this blog.

Note basic index compression is free (you don’t need the Advanced Compression Option), but you do need to be on Enterprise Edition to use this feature.