jump to navigation

FIRST_ROWS_10 CBO Is Hopeless, It’s Using The Wrong Index !! (Weeping Wall) November 5, 2018

Posted by Richard Foote in ALL_ROWS, CBO, Exadata, FIRST_ROWS_10, Oracle Indexes, Siebel.
6 comments

low

There’s an organisation I had been dealing with on and off over the years who were having all sorts of issues with their Siebel System and who were totally convinced their performance issues were due directly to being forced to use the FIRST_ROWS_10 optimizer. I’ve attempted on a number of occasions to explain that their issues are not actually due to some unexplained deficiency with the FIRST_ROWS_10 CBO, but due to a number of other root issues, sadly to no avail. I recently found out they’re still struggling with performance issues, so I thought it might be worth looking at a classic example of where it looks simplistically like a FIRST_ROWS_10 CBO issue, but the “real” underlying problem(s) are actually quite different. Just in case other sites are likewise struggling to identify such SQL performance issues when using FIRST_ROWS_10…

This is a somewhat simplified version of their most common issue. Firstly, I create a table with 3M rows that has two columns of interest. The CODE column is initially populated with two evenly distributed distinct values and the GRADE column which only has the one distinct value.

SQL> create table bowie (id number not null, code number not null, grade number not null, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,2), 42, 'David Bowie'
from dual connect by level > = 3000000;

3000000 rows created.

SQL> commit;

Commit complete.

I then update a few rows (just 5) so that the CODE column now has a few occurrences of a third distinct value and update 5 other rows so the GRADE column has a few occurrences of a second distinct value:

SQL> update bowie set code=2
where id in (42, 4343, 400042, 1420001, 2000042);

5 rows updated.

SQL> commit;

Commit complete.

SQL> update bowie set grade=2
where id in (4212, 434323, 440423, 1440002, 2400642);

5 rows updated.

SQL> commit;

Commit complete.

We now introduce “a root problem”, not collecting histograms on these two columns, such that the CBO doesn’t recognise that the values in these columns are not evenly distributed. The CBO will incorrectly assume the rare CODE values actually occur 1M times as it will assume even distribution across the three distinct values. Now this is NOT the specific root issue at this organisation as they do gather histograms, but they do have numerous issues with the CBO not picking the correct cardinality/selectivity of their SQL.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

We next create indexes on these two CODE and GRADE columns:

SQL> create index bowie_code_i on bowie(code);

Index created.

SQL> create index bowie_grade_i on bowie(grade);

Index created.

Let’s now run the following query using the session default FIRST_ROWS_10 optimizer. The query basically returns just the 5 rows that have a CODE = 2, but sorts the result set by the GRADE column:

SQL> alter session set optimizer_mode=first_rows_10;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 3133133456

---------------------------------------------------------------------------------------------
| Id | Operation                   | Name          | Rows  | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |               |    10 |   240 |       4 (0) | 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | BOWIE         | 1000K |   22M |       4 (0) | 00:00:01 |
|  2 | INDEX FULL SCAN             | BOWIE_GRADE_I |    31 |       |       3 (0) | 00:00:01 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=2)

Statistics
----------------------------------------------------------
    1 recursive calls
    0 db block gets
17518 consistent gets
 5865 physical reads
    0 redo size
  858 bytes sent via SQL*Net to client
  572 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
    5 rows processed

The FIRST_ROWS_10 optimizer has come up with a terrible execution plan. Instead of using the index on the CODE column to quickly access the 5 rows of interest and then sort them, it uses an INDEX FULL SCAN via the GRADE column index.

This results in a massively inefficient execution plan (note 17,518 consistent gets), as the CBO has to basically read the entire table via this GRADE index to eventually find the 5 rows of interest that have a CODE=2.

The FIRST_ROWS_10 certainly appears to be dreadful…

But before you go off and demand that Oracle not use this CBO, the key question to ask here is WHY? Why is the FIRST_ROWS_10 CBO deciding to use what is clearly the wrong index?

If we can understand why this is happening, perhaps we can then address what is clearly a problem with an appropriate solution that might not just fix this query but many many like this. And perhaps we can address this problem with an optimal solution and not with a band-aid fix or with a sub-optimal solution that is beneficial for just this one query.

Now there are actually two clues within this execution plan regarding what is really going on.

The first is that the execution plan is estimating that 1000K rows are to be processed by the table access after the filter on CODE=2 has been applied. But this is not correct, there are only 5 such rows.

The second clue that not all is right is that the CBO is estimating 10 rows are to be retrieved via this FIRST_ROWS_10 access plan (as Oracle is trying here to come up with the best plan to retrieve the first 10 rows as efficiently as possible), however there are only 5 rows that meet this SQL criteria. The CBO is not picking up that less than the 10 mandatory rows will actually be fetched and only need to be considered

I always recommend a couple of things to look at if one ever comes across the scenario where the FIRST_ROWS(N) optimizer doesn’t appear to be behaving itself. The first is to look at a 10053 trace and see what the CBO costings are for the various alternative plans. The second is to simply run the query with the ALL_ROWS CBO to see what it’s initial deliberations might be, noting that the CBO has to perform an initial pass with ALL_ROWS to see the data density of the various steps to accurately come up with the optimal FIRST_ROWS(N) costings. Without knowing the potential full result set, The FIRST_ROWS_10 optimizer wouldn’t be able to determine for example how much of a Full Index Scan actually needs to be processed before it likely finds the necessary rows of interest.

So let’s see what costings and plan we get with the ALL_ROWS CBO:

SQL> alter session set optimizer_mode=all_rows;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 2027917145

------------------------------------------------------------------------------------
| Id | Operation         | Name  | Rows  | Bytes |TempSpc | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT  |       | 1000K |   22M |        |   11173 (8) | 00:00:01 |
|  1 | SORT ORDER BY     |       | 1000K |   22M |    34M |   11173 (8) | 00:00:01 |
|* 2 | TABLE ACCESS FULL | BOWIE | 1000K |   22M |        |   3387 (11) | 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("CODE"=2)

Statistics
----------------------------------------------------------
    0 recursive calls
    0 db block gets
11897 consistent gets
    0 physical reads
    0 redo size
  858 bytes sent via SQL*Net to client
  572 bytes received via SQL*Net from client
    2 SQL*Net roundtrips to/from client
    1 sorts (memory)
    0 sorts (disk)
    5 rows processed

The root issue now becomes somewhat obvious…

ALL_ROWS is not correctly estimating 5 rows are to be returned, but 1000K rows !! Oracle is not estimating that using the index on the CODE column will only fetch 5 rows, but using such an index would retrieve 1000K rows. Using such a CODE index to access 1M rows would therefore be viewed as being much too expensive.

Importantly, the sort step would therefore not sort 5 rows, but would be required to sort 1000K rows, which would be extremely expensive.

Oracle thinks all this when deciding the best way to access the first 10 rows of interest as efficiently as possible with the FIRST_ROWS_10 CBO.

Rather than using the CODE index to first retrieve all 1000K rows, to then sort all 1000K rows before finally being able to return the first 10 rows of interest, Oracle instead does the following.

It uses the index of the GRADE column to retrieve the first 10 rows of interest. As 1 in 3 of all rows are estimated to be of interest (1M out of the 3M rows, because we’re interested in 1 of the 3 distinct CODE values), it estimates it doesn’t actually have to perform much of the FULL INDEX SCAN to find these initial 10 rows of interest.

As the GRADE index was accessed, it also means these first 10 rows would have been fetched in GRADE order. Therefore, there is no need to perform the SORT BY step as the index guarantees the data to be fetched in GRADE order. Not having to perform this sort makes this plan fantastically cheap compared to any other option that first requires all 1000K  of data to be fetched and sorted.

The execution plan when using ALL_ROWS is therefore deciding to perform a Full Table Scan (FTS) to access efficiently what the CBO thinks will be the 1000K rows of interest. This would be much more efficient than accessing all 1000K of interest via either the CODE index (followed by the sort) or via the GRADE index (in which the sort is not required) but requires all the table to be accessed by the index.

Now for this organisation, this FTS is not an entirely bad thing. Why? Because they run Siebel on an Exadata platform !!

Exadata takes this FTS and performs a Smart Scan. And the associated Storage Index can automatically determine this data is extremely rare and potentially only access the relatively few storage regions within the table where these few values of interest reside.

The query goes from taking 60 seconds to run using the “awful” FIRST_ROWS_10 CBO to just 2 seconds with the “brilliant” ALL_ROWS CBO.

However, the “root issue” here is not the FIRST_ROWS_10 CBO but the fact it is being fed insufficient statistics to make an accurate estimate of the true cost. As with all CBOs, rubbish stats in, rubbish plan out…

If we fix the actual root issue and provide the CBO with the necessary statistics to make the correct cardinality/selectivity estimates (in this example by collecting histograms on the skewed data columns):

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE', estimate_percent=>100, method_opt=>'FOR ALL COLUMNS SIZE 75');

PL/SQL procedure successfully completed.

And now re-run the query again with ALL_ROWS:

SQL> alter session set optimizer_mode=all_rows;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 2357877461

-----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name         | Rows | Bytes | Cost (%CPU) | Time     |
-----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |              |    5 |   120 |      5 (20) | 00:00:01 |
|  1 | SORT ORDER BY                       |              |    5 |   120 |      5 (20) | 00:00:01 |
|  2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE        |    5 |   120 |       4 (0) | 00:00:01 |
|* 3 | INDEX RANGE SCAN                    | BOWIE_CODE_I |    5 |       |       3 (0) | 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"=2)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  8 consistent gets
  0 physical reads
  0 redo size
858 bytes sent via SQL*Net to client
572 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  1 sorts (memory)
  0 sorts (disk)
  5 rows processed

We notice that the ALL_ROWS CBO is now correctly determining the correct query cardinality (5 rows) and is now using the CODE index to retrieve the correctly estimated 5 rows. It’s happy to now perform the sort as the sort of 5 rows has a trivial cost (the cost just goes up by 1).

If we now run the query using the default session FIRST_ROWS_10 CBO:

SQL> alter session set optimizer_mode=first_rows_10;

Session altered.

SQL> select * from bowie where code=2 order by grade;

Execution Plan
----------------------------------------------------------
Plan hash value: 2357877461

-----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name         | Rows | Bytes | Cost (%CPU) | Time     |
-----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |              |    5 |   120 |      5 (20) | 00:00:01 |
|  1 | SORT ORDER BY                       |              |    5 |   120 |      5 (20) | 00:00:01 |
|  2 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE        |    5 |   120 |       4 (0) | 00:00:01 |
|* 3 | INDEX RANGE SCAN                    | BOWIE_CODE_I |    5 |       |       3 (0) | 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("CODE"=2)

Statistics
----------------------------------------------------------
  0 recursive calls
  0 db block gets
  8 consistent gets
  0 physical reads
  0 redo size
858 bytes sent via SQL*Net to client
572 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  1 sorts (memory)
  0 sorts (disk)
  5 rows processed

We note it’s also using the same execution plan as ALL_ROWS, as the FIRST_ROWS_10 CBO likewise is correctly determining that using the CODE index is now a very efficient manner in which to access just the 5 rows of interest.

Here’s the thing. If you are returning 10 or less rows, the optimal execution plan for both FIRST_ROWS_10 and ALL_ROWS should ultimately be the same, as they both should cost the associated plans the same way.

By correctly identifying and addressing the root issue here (poor cardinality/selectivity estimates), we get the following considerable benefits:

  • We now have an execution plan that doesn’t take 2 seconds to run, but 0.02 of a second (we are now down to just 8 consistent gets). This is much more efficient than the Exadata FTS and allows for the optimal plan to be selected, not just a better plan.
  • We automatically fix ALL execution plans for all queries that are based on this combination of table and filtering columns
  • We correctly understand and identify issues with any other table that likewise has the same costing issue
  • We don’t unnecessarily have to add ALL_ROWS hints or use ALL_ROWS based baselines to address all such related issues
  • We don’t implement a fix (such as baselines) that becomes ineffective if we were to even change the underlying SQL with any subsequent release
  • We don’t attempt to fix the relatively few problem queries with a global change (such as changing to ALL_ROWS CBO) that can potentially impact negatively as many queries as get addressed
  • We don’t spend years demanding futilely that Oracle Support allow Siebel with ALL_ROWS based session settings

So if you’re running Siebel and having performance issues, don’t just assume it’s some deficiency with the FIRST_ROWS_10 CBO, spend the time to get to the bottom of any root issues (e.g. CBO bugs with getting histograms costs incorrect for CHAR columns, missing statistics on small tables, poor default settings when returning empty result sets, Siebel bugs with Cartesian Joins, missing extended statistics, missing indexes, etc. etc.)…

In a future post, I’ll explain why playing around with the unsupported _sort_elimination_cost_ratio parameter (again, always a bad idea when trying to address specific SQL tuning issues) is ultimately futile when trying to get FIRST_ROWS_10 to not use the clearly inefficient index that eliminates the sort…

Oracle Database 18c: Introduction to Memoptimized RowStore (Memory of a Free Festival) March 5, 2018

Posted by Richard Foote in 18c, 18c New Features, Memoptimized RowStore, Oracle Indexes.
2 comments

memory of a free festival

One of the potentially cool new features introduced in Oracle Database 18c is the Memoptimized RowStore, designed to improve the performance and scalability of key-value based queries.

A new pool in the SGA called the Memoptimize Pool can be configured to store specific heap tables that you may wish to optimize, in a manner not dissimilar to how you may wish to store specific tables in the Keep Pool. The difference with this new pool however is that not only are the table blocks store in the memory pool, but 25% of the pool is configured to automatically create and store a hash index based on the Primary Key (PK) of the table.

Accesses to the table via an equality predicate on the PK can then use the in-memory hash index to very efficiently access the required row, eliminating any physical IOs. In addition, Exadata Exadirect Oracle Net Protocol can leverage Remote Direct Memory Access (RDMA) to allow clients to directly access the required memory regions over the network, avoiding both CPU and OS overheads. The result is potentially very fast and very scalable key-value based queries.

A new parameter MEMOPTIMIZE_POOL_SIZE is used to size the pool as required. Note this pool is part of the SGA and so needs to be carefully considered when sizing the overall SGA. Currently with the first release of 18c, the MEMOPTIMIZE_POOL_SIZE can not be dynamically changed and is not automatically sized and tuned.

Once you’ve set the MEMOPTIMIZE_POOL_SIZE accordingly, you can then create or alter tables to take advantage of the new pool.

The following demo was run on Oracle Live SQL:

I first attempt to create a new table using the new MEMOPTIMIZE FOR READ clause to allocate the table to this new pool:

create table bowie (id number, code number, name varchar2(42))
memoptimize for read;

ORA-62156: MEMOPTIMIZE FOR READ feature not allowed on segment with deferred storage

So a new table can not be created with the default setting of segment with deferred storage. Let’s try again:

create table bowie (id number, code number, name varchar2(42))
segment creation immediate memoptimize for read;

ORA-62142: MEMOPTIMIZE FOR READ feature requires NOT DEFERRABLE PRIMARY KEY constraint on the table

So to use the Memoptimized KeyStore, a table must have a PK. These are the columns by which the automatic in-memory hash index is based.

create table bowie (id number constraint bowie_pk primary key, code number, name varchar2(42))
segment creation immediate memoptimize for read;

Table created.

Success. So this table has been created to use the Memoptimized RowStore.

You can also alter existing tables to make use of the Memoptimized RowStore:

create table bowie2 (id number constraint bowie2_pk primary key, code number, name varchar2(42));

Table created.
insert into bowie2 select rownum, mod(rownum,1000), 'DAVID BOWIE' from dual connect by level <=100000;

100000 row(s) inserted.

commit;

Statement processed.

alter table bowie2 memoptimize for read;

Table altered.

For tables that have just been altered to use the Memoptimized RowStore, you can load the table data into the pool via the new DBMS_MEMOPTIMIZE package:

exec dbms_memoptimize.populate(schema_name=>'SQL_GLGTFSMUBLHEUOFHPRXBRHITF', table_name=>'BOWIE');

ORA-62138: MEMOPTIMIZE memory area does not exist. ORA-06512: at "SYS.DBMS_MEMOPTIMIZE", line 10
ORA-06512: at "SYS.DBMS_MEMOPTIMIZE", line 113
ORA-06512: at line 1
ORA-06512: at "SYS.DBMS_SQL", line 1721

So it appears from the error message that the Memoptimized RowStore is not currently enabled on Oracle Live SQL.

Which makes some sense as it would be very difficult  to size the MEMOPTIMIZE_POOL_SIZE parameter appropriately if any old bugger can create objects in the pool.

So it means we can’t use the Oracle Live SQL environment to test and play around with this new feature, but it doesn’t stop us from exploring a little in how this feature would work.

Let’s now populate the original table:

insert into bowie select rownum, mod(rownum,1000), 'DAVID BOWIE' from dual connect by level <=100000;

100000 row(s) inserted.

commit;

Statement processed.

 

And collects statistics:

exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

Statement processed.

In much as the same way as the Oracle Database In-Memory option can’t guarantee all objects assigned to the InMemory pool can fit in the allocated pool set by  INMEMORY_SIZE, the same goes for the Memoptimized RowStore. Therefore the CBO can only but assume the tables reside in the memory structures in which they’re allocated and only at run time decide if it might need to access the required data elsewhere (e.g. via the Buffer Cache, via a database index rather than the in-memory hash index, etc.).

Note both the commands to create a table and to alter a table to use the Memoptimised RowStore worked even though the necessary pool doesn’t appear to have been allocated in the database.

So lets investigate what the CBO does with a table assigned to the Memoptimized RowStore.

As the PK will automatically have an inmemory hash index created within the Memoptimized RowStore, does Oracle still automatically create a database index as well:

select index_name, blevel, leaf_blocks from user_indexes where table_name='BOWIE';

INDEX_NAME BLEVEL LEAF_BLOCKS

BOWIE_PK        1         187

So yes, an index is still automatically created to police the PK constraint if a table is assigned to the Memoptimized RowStore.

So let’s have a look at the execution plan for a query based on a PK equality predicate:

select * from bowie where id=42;

---------------------------------------------------------------------------------------------------
| Id | Operation                              | Name     | Rows | Bytes | Cost (%CPU) | Time      |
---------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                       |          |    1 |    20 |       2 (0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID READ OPTIM | BOWIE    |    1 |    20 |       2 (0) |  00:00:01 |
|* 2 | INDEX UNIQUE SCAN READ OPTIM           | BOWIE_PK |    1 |       |       1 (0) |  00:00:01 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID"=42)

So we have a couple of new execution plan steps here. INDEX UNIQUE SCAN READ OPTIM which states the in-memory Hash Index will be accessed to find the location of the required row and TABLE ACCESS BY INDEX ROWID READ OPTIM which states the table will then accessed via the rowid obtained from the in-memory Hash Index.

If we run a query that uses a BETWEEN predicate on the PK column:

select * from bowie where id between 42 and 50;

------------------------------------------------------------------------------------------------
| Id | Operation                           | Name     | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                    |          |   10 |   200 |       3 (0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE    |   10 |   200 |       3 (0) |  00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_PK |   10 |       |       2 (0) |  00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ID">=42 AND "ID"<=50)

We notice the Memoptimized RowStore is not used, just the database PK index via a standard Index Range Scan.

If we run a query that the CBO knows will only retrieve 2 rows via an IN clause:

select * from bowie where id in (42, 442);

-----------------------------------------------------------------------------------------
| Id | Operation                   | Name     | Rows | Bytes | Cost (%CPU) | Time      |
-----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |          |    2 |    40 |       3 (0) |  00:00:01 |
|  1 | INLIST ITERATOR             |          |      |       |             |           |
|  2 | TABLE ACCESS BY INDEX ROWID | BOWIE    |    2 |    40 |       3 (0) |  00:00:01 |
|* 3 | INDEX UNIQUE SCAN           | BOWIE_PK |    2 |       |       2 (0) |  00:00:01 |
-----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("ID"=42 OR "ID"=442)

Again, the Memoptimized RowStore is not used, just the database PK index.

The Memoptimized RowStore is only used on queries with a single equality predicate on the table. Even a query with an equality predicate in combination with another predicate on the same table (odd as that may be), will result in the Memopimized RowStore not being used:

select * from bowie where id=42 and name='DAVID BOWIE';

----------------------------------------------------------------------------------------
| Id | Operation                   | Name     | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |          |    1 |    21 |       2 (0) | 00:00:01 |
|* 1 | TABLE ACCESS BY INDEX ROWID | BOWIE    |    1 |    21 |       2 (0) | 00:00:01 |
|* 2 | INDEX UNIQUE SCAN           | BOWIE_PK |    1 |       |       1 (0) | 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("NAME"='DAVID BOWIE')
2 - access("ID"=42)

 

So the Memoptimized RowStore is only considered by the CBO, for queries based on a singular PK=value predicate on a table.

More to come once I have my own database to play in 🙂

Indexing Seminar: Why Small Classes? (A Small Plot Of Land) October 5, 2017

Posted by Richard Foote in Index Internals, Index Internals Seminar, Oracle Indexes, Richard Foote Consulting, Richard's Musings.
4 comments

outside2

Firstly, a huge thank-you to everyone who has enroled in one of my “Indexing Internals and Best Practices” Seminars running soon is Australia. The feedback and overall reaction has been extremely positive and I can’t wait to present all the material I’ve been putting together these past few weeks.

There are currently limited places still available at each of the 6 venues, but places are selling out fast. All enrolment details can be found here: https://www.eventbrite.com.au/o/richard-foote-consulting-pty-ltd-15076769762

A question I’ve been asked a couple of times is why the insistence on only running small classes? I’ve previously run similar events overseas to large audiences, I’ve had classes of 40+ a number of times in Europe and I’ve run a one day version of this at Hotsos last year to more people than I could count (I’m guessing 80+). And while they were all highly successful, I will be restricting classes to much smaller class sizes in this Australia tour and into the future.  There are a number of reasons for this, with the key points being:

  • The seminar has been fully revised and updated and is packed with heaps of new content. I want to be able to cover as much material as is practical and this is only possible with small classes. The more I cover, the more attendees have the opportunity to learn.
  • I really want these seminars to be interactive events, with attendees having plenty of opportunity to ask questions. This makes it a more valuable experience as any ambiguities, doubts, environment specific questions, important bits that people don’t get first time, etc. etc. can clarified and addressed.
  • The seminar goes for 2 days, but if I had 50 people and each person asked 15 mins of questions that needed answering, that’s almost the entire time gone. So larger class sizes makes it impractical to have any meaningful Q&A
  • Attendees bring vast amounts of experiences and solutions to Oracle Indexing related scenarios as all Oracle databases out there utilise indexes. Even on Exadata. Even in future autonomous databases. Small classes gives me the opportunity to tap into this.
  • I want to learn just as much as everybody else and if I do all the talking, I’ll learn nothing new. Small classes gives me the opportunity to learn, based on questions asked and experiences shared.
  • All attendees have their own requirements and expectations from attending the seminar. If someone has specific issues at their environments, what a wonderful opportunity to learn and perhaps get these issues addressed by attending the seminar. Again only practical in small class environments.
  • I will be contacting all attendees BEFORE the seminars to determine what their individual expectations are from the seminars. I will then ensure that any appropriate material is fully covered to meet those expectations. This again is only practical with small class sizes.
  • Finally, you’re spending 2 days with the one and only Richard Foote 🙂 As well as learning more about indexes than you thought possible, as well as learning how an appropriate indexing strategy can dramatically improve database/application performance, as well as discovering an appreciations for the works of David Bowie, it’s also an opportunity to have a sense of 1 on 1 with myself  and that I’m there to directly assist “you” with truly understanding how Oracle indexes should be properly utilised. There is little sense of 1 on 1 with large class sizes.

 

So these seminars are designed very intentionally to be “premium” events, offering a training experience that really is second to none. These seminars have always been very well received so I’m truly excited that I’ll be running these events with the best content I’ve ever compiled. Small class settings gives all attendees the very best opportunity to absorb it all in and apply directly what they’re learnt to improving the performance of their database/application environments..

Like I said, there are only limited places now remaining for the Australian Tour so get in early or else risk missing out.

Events in other parts of our beautiful planet are currently being planned and will be announced in the near future. If you have any questions or requests on where to run these seminars, just ask: richard@richardfooteconsulting.com

Oracle Indexing Myths (Telling Lies) September 26, 2017

Posted by Richard Foote in Oracle Indexes.
2 comments

DavidBowie-TellingLies-CD

I’m currently hard at work fully revising and updating my “Indexing Internals and Best Practices” seminar in time for a series of events I’ll be running in Australia in October/November. This has been a highly popular and acclaimed 2 day seminar that is a must for any DBA, Developer, Solutions Architect or anyone else interested in designing, developing or maintaining high performance Oracle-based databases/applications.

For a number of reasons, there are many myths in relation to how Oracle Indexes function, how they should be maintained and how and when indexes should be implemented. One of the goals of my seminar is to go through and dispel many of these myths and demonstrate how these misconceptions can unfortunately often result in inappropriate and inefficient indexing strategies being deployed. I have investigated numerous customer databases (including large scale Exadata sites) where the database is running in a severely sub-optimal manner as a direct result of  inappropriate indexing implementations based on these myths.

Some of these misconceptions are just plain wrong, some are true in other databases but not Oracle, some might have once been true but not anymore, some might be mostly true but not always, some might be mostly false but not always and it’s often these exceptions that can make such an important difference in how indexes could/should be deployed in some environments. Words such as “never” ,”always”, “all” and “none” are dangerous to use when discussing technology as exceptions, caveats and limitations can impact/distort the truth and how such technology should be implemented.

My use of the term “Best Practices” in the seminar title is actually a deliberate, somewhat cynical reuse of a term that actually gets many customers in strife as what might be a so-called “Best Practice” in one environment might be totally inappropriate in another environments.

In no particular order, the following Oracle Indexing “myths” (or misleading generalisations) are discussed throughout my seminar and are still prevalent out there in the Oracle universe:

  • Oracle B-Tree indexes become “unbalanced” over time and need to be rebuilt
  • Deleted space in index is “deadwood” and over time requires index to be rebuilt
  • If index height greater than ‘x’, it becomes inefficient and needs to be rebuilt
  • If index grows to 2x its height, it is 2x more costly to use
  • PCTFREE enables space for index entries to grow within current leaf block
  • If index has a poor (very high) Clustering Factor, rebuild the index
  • To improve the Clustering Factor, you have to rebuild the underling table
  • Clustering Factor “as good as it gets” after gathering 100% estimate index statistics
  • To improve performance, regularly rebuild indexes
  • You never have to rebuild an index to improve performance
  • Statistics from INDEX_STATS provides reliable metrics on when to rebuild an index
  • If delete rows as reported in INDEX_STATS > x%, rebuild
  • If delete rows as reported in INDEX_STATS < x%, don’t rebuild
  • Analyze Index Validate Structure is a safe method of collecting index metrics
  • Index rebuilds are inexpensive and unobtrusive
  • Primary/Unique Key constraints require a unique index
  • Drop/disable a constraint, unique index policing index is automatically dropped
  • All Foreign Key constraints must be indexed
  • Indexes should eliminate sorting
  • Only indexed columns require statistics
  • Bitmap Indexes only useful with low cardinality columns
  • Bitmap Index will be used when a B-tree is not for low cardinality columns
  • Null values are not indexed
  • Small tables (say < 100 rows) don’t benefit from indexing
  • Separating indexes from tables in tablespaces improves performance
  • Range scans not possible with Reverse Key indexes
  • Local indexes improve performance
  • Put most discriminating column first in concatenated indexes
  • If SQL references all columns in index, index column order is irrelevant
  • If leading column of index is not referenced in SQL, index not considered by CBO
  • Monitoring Indexes will highlight which indexes can be safely dropped
  • Indexing Tracking will highlight which indexes can be safely dropped
  • Index Compression make indexes smaller
  • B-Tree Index only useful with high cardinality columns
  • Pointless indexing a column with one distinct value
  • If more than x% rows returned, index is inappropriate, where x% between 0 & 100
  • Full Table Scan more efficient than index range scan with table access, when returning 100% of data
  • The CBO cost an internal value of no practical use for tuning/comparison purposes
  • Index is best solution to return 0% of data
  • You don’t need indexes in Exadata
  • Indexes less important in Exadata, as Storage Indexes can take over if database index is missing

It’s important to note it potentially only takes just the one bad row inserted in a table, one inaccurate statistic, one index being used inappropriately, one missing index not created during a application upgrade, to bring down a whole Oracle database, RAC nodes running on an Exadata included…

If you want to learn why the above statements are misleading and lots, lots, lots more, come and attend my Indexing Seminar. Tickets for the Australian events are on sale here: https://www.eventbrite.com.au/o/richard-foote-consulting-pty-ltd-15076769762

Dates for other regions will be announced in the near future: https://richardfooteconsulting.com/seminardates/

If you wish to express an interest in attending a seminar near you, please email: richard@richardfooteconsulting.com.

 

 

 

Let’s Talk Database: New Dates Released (You Can’t Talk) March 16, 2017

Posted by Richard Foote in Let's Talk Database, Oracle Indexes.
3 comments

Oracle-Database

Good news for all you Oracle Database folk in ANZ. I’ll be running a new series of “Let’s Talk Database” sessions at a venue near you in the coming months. I’ll be discussing two interesting topics, “Introduction to the Oracle Cloud for DBAs” and “Introduction to Oracle Database 12c Release 2“. The dates for this series and registration links as follows:

Agenda

8:30 – 9:00am Registration and coffee
9:00 – 10:00am Introduction to Oracle Cloud for DBAs
10:00 – 10:15am Break
10:15 – 11:15am Introduction to Oracle Database 12c Release 2–Part I
11.15 – 11:30am Break
11:30 – 12:30pm Introduction to Oracle Database 12c Release 2–Part II
12:30 – 1:30pm Lunch, Networking and informal Q&A

Session Abstracts

Introduction to the Oracle Cloud for DBAs

The Oracle Cloud has many key features and benefits that would be of much interest to Oracle DBAs including increased flexibility and elasticity, better security and compliance, reduced complexity with regard to maintenance and patching, improved access to database features and faster database deployments. This session will introduce the various options and capabilities available when deploying Oracle Databases within the Oracle Public Cloud, such as database services and deployment options, database option bundling, backup/recovery and DR services, hybrid manageability features and database accessing options. Oracle Exadata Express Cloud Service will be used as an example of how to quickly deploy and access an Oracle Database in the cloud. We will also look at some options with regard to a Hybrid Cloud model, including deploying Cloud based infrastructure within your own data centre. Finally, we introduce the Oracle Cloud data centre which has recently been expanded in Sydney, Australia.

Introduction to Oracle Database 12c Release 2

Oracle Database 12c Release 2 continues Oracle’s innovation with its leading database with the introduction of many new exciting features and enhancements. This session will discuss in some detail many of these new 12.2 capabilities that makes the Oracle Database the cloud database of choice, including improvements with the Database In-Memory option, new indexing compression and tracking features, new online maintenance operations such as online tablespace encryption, online table move, online table/index partition conversion, new pluggable database capabilities such as hot cloning and application containers, Oracle Sharding, Active Data Guard and RAC improvements, Big Data and Analytics innovations as well as several new developer features such as JSON support improvements, SQL enhancements and case insensitive support.

So lots to talk about and lots of opportunity to ask questions and network with your fellow Oracle professionals.

As usual, places are unfortunately limited so contact your location and enroll ASAP to avoid disappointment.

Hope to see many of you at these upcoming events 🙂

Hotsos Symposium 2017: Feb 27 – Mar 2 Dallas, Texas (I’m Afraid of Americans) February 6, 2017

Posted by Richard Foote in Oracle Indexes.
3 comments

sym17_logo

In just 3 weeks time, I’ll again have the pleasure of presenting at the awesome Hotsos Symposium in Dallas, Texas. It’s simply the best conference with regard to Oracle performance tuning anywhere.

This year, I’ll be presenting on a couple of topics:

New Index Features Introduced in Oracle Database 12c Release 2

Oracle Database 12c Release 2 is now available in both Oracle DBaaS and Exadata Express Cloud Services and will soon be generally available. Oracle 12.2 has introduced a number of extremely useful new indexing features and enhancements. These include cool capabilities such as tracking index usage, advanced index compression enhancements, deferred invalidation of cursors during index creation/rebuild, automatic index maintenance during new online operations and JSON indexing enhancements. These will all be discussed in much detail with examples on how they can be usefully deployed.

Real World Examples on how to use AWR Reports to Solve Performance Issues

I often get called in by customers to determine and address the root cause of database performance issues. Depending on the issue, a request for a simple Automatic Workload Repository (AWR) report is often sufficient to accurately diagnose the root problem(s). However, many DBAs find AWR reports daunting as they contain so much database diagnostic information, they don’t even know where to begin to find the truly useful information.

In this session, I begin by discussing the basic concepts of response times, DB times, the Oracle Wait Interface and how to focus on what is truly important. I then look at how to best read an AWR report to quickly go to the most relevant sections that detail any specific issues. I also discuss a general tuning and diagnostic methodology that ensures one can quickly determine whether an AWR report will indeed be sufficient and how to accurately and consistently use the AWR report to pinpoint and determine root causes for global database performance issues. We’ll go through a number of actual “real-life” examples that highlight various performance issues and how one accurately determines the actual root issues through an appropriate AWR report.

 

Looking forward to catching up with a number of my Oracle friends. Hopefully you can make it, learning lots of really useful stuff in relation to Oracle Performance Tuning is guaranteed 🙂

Oracle Database 12c Release 2: New Indexing Features – Long Identifiers (Several Species of Small Furry Animals Gathered Together in a Cave and Grooving With a Pict) November 24, 2016

Posted by Richard Foote in 12c Rel 2, 12c Release 2 New Features, Long Identifiers, Oracle Indexes.
1 comment so far

Oracle Database 12c Release 2 has now been officially released (at least on the Oracle Database Cloud and Oracle Database Exadata Express Cloud Services).

In the coming weeks, I’ll be blogging about quite a number of new indexing features/capabilities/improvements that have been introduced in Oracle Database 12c Release 2. These include:

  • Advanced Index Compression
  • Tracking Index Usage
  • Online Table Move (and automatic Index maintenance)
  • Online Conversion to Partitioned Table (and Partitioned Indexes)
  • Deferred Invalidation of Cursors During Index Creation/Rebuild
  • Indexing JSON Improvements
  • Text Index Improvements

To start things off, how many times have you wanted to name an index something such as follows:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,10), 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index this_index_will_be_used_to_get_data_from_the_bowie_table_in_sc
enarios_when_the_code_column_is_used_in_predicates_idx on bowie(code);
create index this_index_will_be_used_to_get_data_from_the_bowie_table_in_scenari
os_when_the_code_column_is_used_in_predicates_idx on bowie(code)
*
ERROR at line 1:
ORA-00972: identifier is too long

Only to get an error that you’ve exceeded the 30 character identifier.

Well, with 12.2, the maximum length of an identifier has been increased to 128 characters, so now creating such an index works:


SQL> create index this_index_will_be_used_to_get_data_from_the_bowie_table_in_scenarios_when_the_code_column_is_used_in_predicates_idx on 
bowie(code);

Index created.

SQL> select index_name, leaf_blocks, status from dba_indexes where table_name='BOWIE';

INDEX_NAME                                         LEAF_BLOCKS STATUS
-------------------------------------------------- ----------- --------
THIS_INDEX_WILL_BE_USED_TO_GET_DATA_FROM_THE_BOWIE        1939 VALID
_TABLE_IN_SCENARIOS_WHEN_THE_CODE_COLUMN_IS_USED_I
N_PREDICATES_IDX

 

The possibilities are endless:

SQL> alter table bowie add constraint the_primary_key_of_the_bowie_table_is_the_
id_column_so_please_stop_trying_to_insert_a_duplicate_id_value_dumbo primary key
(id);

Table altered.

SQL> insert into bowie values (42, 42, 'David Bowie');
insert into bowie values (42, 42, 'David Bowie')
*
ERROR at line 1:
ORA-00001: unique constraint
(BOWIE.THE_PRIMARY_KEY_OF_THE_BOWIE_TABLE_IS_THE_ID_COLUMN_SO_PLEASE_STOP_TRYING _TO_INSERT_A_DUPLICATE_ID_VALUE_DUMBO) violated

Developers can now be, shall we say, more “expressive” with the database objects they create …

Next Round Of ANZ “Let’s Talk Database” Events (July/August 2016) June 20, 2016

Posted by Richard Foote in Let's Talk Database, Oracle Indexes.
11 comments

I’ll be presenting the next round of “Let’s Talk Database” events around Australia and NZ this winter in July/August 2016. These are free events but due to limited places have often “sold out” in the past, so booking early is recommended to avoid disappointment.

All events run between 9:00am – 12:30pm and are followed by a networking lunch.

We always have a fun time at these events and feedback suggest everyone learns something, so I hope if you’re lucky enough to be in the ANZ region, you can make it to your local event 🙂

Currently, the confirmed events and topics are as follows:

  • Sydney – Wednesday, 20 July 2016: Let’s Talk Oracle Database: Intro To CBO Costings, Database Futures. To register, email: mitch.foster@oracle.com
  • Melbourne – Thursday, 21 July 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: johanne.sergeant@oracle.com SOLD OUT !!
  • Canberra – Thursday, 28 July 2016: Let’s Talk Database: Oracle Database 12c – Built for Data Warehousing. To register, email: richard.foote@oracle.com
  • Wellington –  Wednesday, 10 August 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: tanya.fea@oracle.com
  • Auckland – Thursday, 11 August 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: tanya.fea@oracle.com
  • Perth – Wednesday, 17 August 2016: Let’s Talk Database: Intro To CBO Costings, Database Futures. To register, email: dennis.ward@oracle.com
  • Adelaide – Thursday, 18 August 2016: Let’s Talk Database: Oracle Database 12c – Built for Data Warehousing. To register, email: brodie.james@oracle.com
  • Brisbane: Wednesday, 24 August 2016: Let’s Talk Oracle Database: Intro To CBO Costings, Database Futures. To register, email: gordon.laing@oracle.com

 

Details of sessions.

Performance Diagnostics – How To Correctly Use AWR Reports

I often get called in by customers to determine and address the root cause of database performance issues. Depending on the issue, a request for a simple Automatic Workload Repository (AWR) report is often sufficient to accurately diagnose the root problem(s). However, many DBAs find AWR reports daunting as they contain so much database diagnostic information, they don’t even know where to begin to find the truly useful information.

In this session, I begin by discussing the basic concepts of response times, DB times, the Oracle Wait Interface and how to focus on what is truly important. I then look at how to best read an AWR report to quickly go to the most relevant sections that detail any specific issues. I also discuss a general tuning and diagnostic methodology that ensures one can quickly determine whether an AWR report will indeed be sufficient and how to accurately and consistently use the AWR report to pinpoint and determine root causes for global database performance issues. We’ll go through a number of actual “real-life” examples that highlight various performance issues and how one accurately determines the actual root issues through an appropriate AWR report.

Time permitting, we’ll go through some other useful diagnostic tools such as the AWR Warehouse, Automatic Database Diagnostic Monitor (ADDM), Active Session History (ASH)  and SQL Extended Tracing and scenarios when they can prove to be invaluable in addressing database performance issues.

 

Introduction Into The CBO (Why Is The CBO doing a Full Table Scan?)

A common issue is confusion on why the Oracle Cost Based Optimizer (CBO) has made the decision the use a full table scan or use that index and not the one that appears more appropriate. The costing figures that the CBO derives and displays in execution plans are often viewed as cryptic, which is unfortunate as these costings are key to CBO decisions. Often the attempt to resolve such issues is to set inappropriate database parameters that introduces as many issues as it tries to resolve. This session looks under the covers of the CBO to see how it precisely derives its various costings and hence how to correctly troubleshoot and resolve inefficient queries when they arise. Bring a calculator, it might come in handy 🙂

Database Futures – Exadata Secret Sauce and Software on Silicon

Oracle has recently released the new generation of the Exadata machine, the X6-2. Not only is it bigger and faster but also has a host of features and capabilities that is unique to the Exadata storage platform including Smart Scans, Storage Indexes, Exafusion, Smart Fusion Block Transfer, Dual Format Columnar Flash Cache, I/O Latency Capping, Hybrid Columnar Compression, Sparse Snapshots and In-Memory Database Fault Tolerance. All these unique features and their enhancements with the new X6-2 will be discussed. Additionally, Oracle’s Software-in-Silicon features extend the concepts of co-engineering hardware and software inherent in engineered systems such as the Exadata. Embedding software which has traditionally sat in the database layer into the hardware layer can yield both performance and security beyond that available of a traditional CPU. This presentation also covers how Oracle Database 12c can leverage Software-in-Silicon features, where the Oracle Database is heading in the future and why you should care.

 

Oracle Database 12c – Built for Data Warehousing

The Oracle Database is the leading database in market but it might come as a surprise to some that it’is also the leading database with respect to Data Warehousing in terms of both sales and analyst rankings. The focus today is a detailed look at all the special database capabilities that makes the Oracle Database the perfect platform for Data Warehouse type applications.

We explore the overall Data Warehousing Information Management architecture and how it relates to both data warehouses and data reservoirs. We then deep dive into standard database features such as Materialized Views, Dimensions, Bitmap Indexes, Query Re-Write, JSON in the database, Partitioning, Parallel Processing, Compression, Attribute Clustering and SQL for analysis, reporting, aggregation and pattern matching.

We then discuss more advanced Oracle database capabilities such as Data Mining, Predictive Analytics, R in the database, Database In-Memory, Big Data SQL, Spatial & Graph and the specific engineered systems capabilities that makes Oracle such an effective Data Warehouse platform, especially when combined with the enterprise capabilities already associated with the Oracle Database such as security, high availability, scalability and manageability.

New Oracle Cloud Offering – Indexing as a Service (IDXaaS) (I Pity The Fool) April 1, 2016

Posted by Richard Foote in 12c Rel 2, IDXaaS, Oracle Cloud, Oracle Indexes.
12 comments

This of course is an April Fools joke. Sorry !!

A very exciting announcement !! I’ve recently been promoted within Oracle Corporation to lead their brand new Oracle Cloud offering “Indexing as a Service” (IDXaaS) team, based on my previous work and experience in the indexing space. Yes, I’m both thrilled and excited to be involved in this new venture 🙂

Promising both improved levels of service and reductions in maintenance and administration overheads, this new Oracle Cloud offering will take all indexing related hassles from on-premise databases and address them in the Cloud. No more index rebuilds, no more concerns you don’t have the right indexes in place and importantly no more worrying that indexes are taking up valuable TBs of storage. Oracle indexes can now be safely and seamlessly migrated and stored in the Oracle Cloud.

Relational databases depend heavily on indexes to ensure data can be retrieved quickly and efficiently, but indexes can be very difficult to administer, can use considerable resources to maintain and consume considerable amounts of disk space. They can be so problematic that some folk actually devote an entire blog on the subject !! Imagine if all these issues and overheads can be taken away from you when administrating Oracle databases…

Index DDL will be enhanced in the 12c Release 2 Oracle Database to optionally specify a Cloud Domian Identifier (CDI) instead of a tablespace. So you will be able to either build or rebuild an index into the Oracle IDXaaS Cloud with syntax such as follows: CREATE INDEX bowie_idx on bowie(id) CLOUD LOCATION bowies_cloud_domain; This index will fully index the bowie table that exists in an on premise database (or indeed a database that is also in the cloud) but the index itself will be created and maintained in the Oracle Cloud. The database is fully aware of the actual physical location of the index as it is when created in a traditional tablespace and will simply update the index structure as DML is applied to the base table as it has always done. All Oracle indexing types and options will be fully supported including Oracle Text and Spatial indexes.

Like indexes stored in a tablespace, these indexes will need to be rebuilt or coalesced every now and then, but you won’t need to worry about this as Oracle will now implicitly do this for you in the Oracle Cloud. I’m currently hiring a team of DBAs with expertise in the “art” of rebuilding indexes who will ensure all your indexes will be rebuilt automatically for customers as necessary. By default, these operations will usually be performed on Sundays within the Oracle Cloud, but as indexes can be rebuilt online and as these indexes are no longer physically stored within your database, it’s an operation that is now performed automatically and seamlessly for you. Customers no longer need concern themselves with these types of index operations or when they occur, so while “we” work over the weekend, your DBAs can instead relax and keep their gardens nice and weed free. Please leave a comment on the blog if you think you have what it takes to effectively rebuild Oracle indexes and interested in joining my elite IDXaaS team !!

For customers that choose to use the “Premium IDXaaS” offering, Oracle will automatically create or drop indexes as required in the Oracle Cloud. Customers no longer have to make the decision on what indexes should be created on which columns in which order; this will be performed automatically for you by Oracle’s IDXaaS. By running new advanced analysis routines on your database workloads, the Premium IDXaaS offering will create an index if it will help the efficiency of any of your current workloads. Conversely, if Oracle believes an index is redundant (for example if your database is running on Exadata), indexes will first be automatically made “invisible” and if there are no serious repercussions, will be automatically dropped 7 days later. DBAs and Developers will no longer need to worry about which indexes to create/drop, significantly reducing the complexities in managing both current and future Oracle database environments.

Oracle will also introduce a new package that will easily migrate all your existing indexes for you into the Oracle Cloud. The DBMS_INDEX_CLOUD package will automatically migrate all indexes for a specified Table/Partition/Schema/Pluggable Database/Database into the Oracle Cloud, which will be a fully online, seamless operation. Once completed, associated indexes within “on premise” tablespaces will effectively be dropped and be replaced with their Cloud equivalents. All that “expensive” storage that was previously tied up in storing all those Oracle indexes can now be freed up to store much more business rich data such as emails, selfie photos, David Bowie music catalogs, etc. Note that these significant storage savings will also translate in smaller backups, smaller Standby databases, smaller Production clones, etc. Importantly, the Oracle Cost Based Optimizer will be fully aware of the Cloud Indexes (as their definitions, statistics and other metadata are still stored within the database data dictionary) and will use the Cloud Indexes as appropriate. No changes to existing SQL is necessary, the CBO will decide to use an index in the cloud in the same manner as it chooses to use an index currently. There is also a new Cloud Index execution path option that allows data to be retrieved via the Cloud Index in a “good enough” manner which is much more efficient than a normal index scan, although it might not necessarily retrieve all the possible data that meets an SQL predicate. It effectively provides what we refer to as that “internet” data retrieval experience.

Oracle Open World 2016 will be the time of the official release for this exciting new Oracle Cloud offering. I will be presenting a number of sessions at OOW16 on this new IDXaaS and there will be a number of live demos to demonstrate its unique capabilities and the simplicity of the migration process. However, there will be a number of announcements before then, including the opportunity to be involved in a beta customer program. There will also be a number of hands-on workshops being conducted globally, with customers getting the chance to see for themselves how easily it is to move database indexes into the Oracle Cloud.

Lastly and perhaps most exciting of all, this new IDXaaS capability will initially be a free option with all current Oracle databases from 12c Release 2 onwards. You will be able to migrate all your existing 12.2 Oracle database indexes onto the Oracle Cloud and they will be stored, maintained and administrated for free as part of your current Oracle database licenses. All Oracle database editions will be supported. This free offer however is likely to end once the 12c Release 2 database is officially released but the additional pay as you go licensing constructs are proposed to be extremely competitive with yearly “indexed” priced increments. Also to be subsequently released will be Oracle IDXaaS support for most other commercial databases including DB2, SQL Server, NoSQL, MySQL, Sybase, Hana, Hadoop and Notepad.

So administrating Oracle (and other) Databases will soon change forever.  Oracle indexes will no longer have to be a concern for customers as all database indexes can instead be stored in the Oracle Cloud via the new IDXaaS offering, allowing Oracle to automatically store and manage these tricky database objects for you. Not only will index rebuilds be a thing of the past, but so will all concerns regarding index storage, creation and maintenance. I’m very excited to be involved in this new undertaking and indeed with my move to Oracle HQ in San Francisco in the coming months (I’ll up in the Clouds level of Building 1). I look forward to talking IDXaaS with you in the days to come 🙂

 

This of course is an April Fools joke. Sorry !!

New Round of Let’s Talk Database Events February 12, 2016

Posted by Richard Foote in Oracle Indexes.
3 comments

I’ll be presenting a new round of “Let’s Talk Database” events around Australia and NZ next month. These are free events but have often “sold out” in the past so booking early is recommended to avoid disappointment.

All events run between 9:00am – 12:30pm and are followed by a networking lunch.

Currently, the confirmed events are:

  • Sydney – Tuesday, 22 March 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: mitch.foster@oracle.com
  • Melbourne – Wednesday, 23 March 2016: Let’s Talk Oracle Database: Oracle 12c Database New Features. To register, email: johanne.sergeant@oracle.com SOLD OUT !!
  • Canberra – Thursday, 31 March 2016: Let’s Talk Database: Intro To CBO Costings, APEX New Features and Software-on-Silicon. To register, email: richard.foote@oracle.com
  • Perth – Thursday, 5 May 2016: Let’s Talk Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: dennis.ward@oracle.com
  • Adelaide – Friday, 6 May 2016: Let’s Talk Database: Intro To CBO Costings, Database Futures. To register, email: brodie.james@oracle.com
  • Wellington Wednesday, 11 May 2016: Let’s Talk Oracle Database: Oracle 12c Database New Features. To register, email: tanya.fea@oracle.com
  • Brisbane: Wednesday, 18 May 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: gordon.laing@oracle.com

 

More “Let’s Talk Database” events will be added in the coming days.

Details of sessions:

Oracle 12c Database New Features

An in-depth technical look at many of the exciting enhancements and new capabilities introduced in the Oracle 12c Database that help make life easier for the DBA and Oracle Developer. Among a number of topics, we’ll have a look at changes with respect the Cost Based Optimizer, new statistics gathering options, discuss many of the new indexing related capabilities such as Partial Indexes, Advanced Index Compression, Asynchronous Global Index Maintenance, etc. and look at many of those cool little things such as In-Database Archiving, Automatic Data Optimization, Identity Columns, Row-Limiting Clauses, Temporary Undo, Online Move operations, Clustering Attributes, Temporal and Improved Flashback Queries etc. to name but a few that can sometimes make all the difference. Might even squeeze in a bit of Database In-Memory. So plenty to talk about !!!

Performance Diagnostics – How To Correctly Use AWR Reports

I often get called in by customers to determine and address the root cause of database performance issues. Depending on the issue, a request for a simple Automatic Workload Repository (AWR) report is often sufficient to accurately diagnose the root problem(s). However, many DBAs find AWR reports daunting as they contain so much database diagnostic information, they don’t even know where to begin to find the truly useful information.

In this session, I begin by discussing the basic concepts of response times, DB times, the Oracle Wait Interface and how to focus on what is truly important. I then look at how to best read an AWR report to quickly go to the most relevant sections that detail any specific issues. I also discuss a general tuning and diagnostic methodology that ensures one can quickly determine whether an AWR report will indeed be sufficient and how to accurately and consistently use the AWR report to pinpoint and determine root causes for global database performance issues. We’ll go through a number of actual “real-life” examples that highlight various performance issues and how one accurately determines the actual root issues through an appropriate AWR report.

Time permitting, we’ll go through some other useful diagnostic tools such as the AWR Warehouse, Automatic Database Diagnostic Monitor (ADDM), Active Session History (ASH)  and SQL Extended Tracing and scenarios when they can prove to be invaluable in addressing database performance issues.

Intro To CBO Costings, APEX New Features and Software-on-Silicon

A common issue is confusion on why the Oracle Cost Based Optimizer (CBO) has made the decision the use a full table scan or use that index and not the one that appears more appropriate. The costing figures that the CBO derives and displays in execution plans are often viewed as cryptic, which is unfortunate as these costings are key to CBO decisions. Often the attempt to resolve such issues is to set inappropriate database parameters that introduces as many issues as it tries to resolve. This session looks under the covers of the CBO to see how it precisely derives its various costings and hence how to correctly troubleshoot and resolve inefficient queries. Bring a calculator, it might come in handy J

Oracle Application Express (APEX) is a rapid development environment built into the Oracle database. APEX 5 is the latest and largest release of Application Express. This APEX session will include an overview of APEX concentrating on new features, followed by a live demonstration. The session will also look at how easy it is to deploy APEX applications in the cloud.

Oracle’s Software-in-Silicon features extend the concepts of co-engineering hardware and software inherent in engineered systems such as the Exadata. Embedding software which has traditionally sat in the database layer into the hardware layer can yield both performance and security beyond that available of a traditional CPU. This presentation covers how Oracle Database 12c can leverage Software-in-Silicon features, and how third party developers can make use of the new features within their own applications.

 

Hope to see you at one of these events 🙂

Presenting the Hotsos Symposium Training Day – 10 March 2016 (Heat) September 16, 2015

Posted by Richard Foote in Oracle Indexes.
4 comments

I’ve just accepted an invitation to present the Hotsos Symposium Training Day on 10 March 2016 in sunny Dallas, Texas. In the age of Exadata and In-Memory databases, it’ll be an updated and consolidated version of my Index Internals and Best Practices seminar. With an emphasis on using indexes appropriately to boost performance, it’ll feature lots of tips, tricks and references to David Bowie 🙂

The Hotsos Symposium (6-10 March 2016), with it’s focus on performance analysis is one of the very best conferences I’ve ever presented at and I’m very excited at the prospect of returning next year. The confirmed list of presenters is already impressive: Tanel Põder, Bryn Llewellyn, Kellyn Pot’Vin, Jim Czuprynski, Jeff Smith, Kerry Osborne, Carlos Sierra and now Richard Foote, as I’ll likely present a session or two during the conference proper as well.

If you get the chance, do yourself a favour and join me at one of the best Oracle conferences available and hopefully stay on for the Training Day where there’ll be lots of learning and fun to be had !!

12.1.0.2 Introduction to Zone Maps Part III (Little By Little) November 24, 2014

Posted by Richard Foote in 12c, Attribute Clustering, Oracle Indexes, Zone Maps.
1 comment so far

I’ve previously discussed the new Zone Map database feature and how they work in a similar manner to Exadata Storage indexes.

Just like Storage Indexes (and conventional indexes for that manner), they work best when the data is well clustered in relation to the Zone Map or index. By having the data in the table ordered in the same manner as the Zone Map, the ranges of the min/max values for each 8M “zone” in the table can be as narrow as possible, making them more likely to eliminate zone accesses.

On the other hand, if the data in the table is not well clustered, then the min/max ranges within the Zone Map can be extremely wide, making their effectiveness limited.

In my previous example on the ALBUM_ID column in my first article on this subject, the data was extremely well clustered and so the associated Zone Map was very effective. But what if the data is poorly clustered ?

To illustrate, I’m going to create a Zone Map based on the poorly clustered ARTIST_ID column, which has its values randomly distributed throughout the whole table:

SQL> create materialized zonemap big_bowie_artist_id_zm on big_bowie(artist_id);
 create materialized zonemap big_bowie_artist_id_zm on big_bowie(artist_id)
 *
 ERROR at line 1:
 ORA-31958: fact table "BOWIE"."BIG_BOWIE" already has a zonemap
 "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" on it

Another difference between an index and Zone Map is that there can only be the one Zone Map defined per table, but a Zone Map can include multiple columns. As I already have a Zone Map defined on just the ALBUM_ID column, I can’t just create another.

So I’ll drop the current Zone Map and create a new one based on both the ARTIST_ID and ALBUM_ID columns:

SQL> drop materialized zonemap big_bowie_album_id_zm;

Materialized zonemap dropped.

SQL> create materialized zonemap big_bowie_zm on big_bowie(album_id, artist_id);

Materialized zonemap created.
    
 SQL> select measure, position_in_select, agg_function, agg_column_name
 from dba_zonemap_measures where zonemap_name='BIG_BOWIE_ZM';

MEASURE              POSITION_IN_SELECT AGG_FUNCTION  AGG_COLUMN_NAME
 -------------------- ------------------ ------------- --------------------
 "BOWIE"."BIG_BOWIE".                  5 MAX           MAX_2_ARTIST_ID
 "ARTIST_ID"

"BOWIE"."BIG_BOWIE".                  4 MIN           MIN_2_ARTIST_ID
 "ARTIST_ID"

"BOWIE"."BIG_BOWIE".                  3 MAX           MAX_1_ALBUM_ID
 "ALBUM_ID"

"BOWIE"."BIG_BOWIE".                  2 MIN           MIN_1_ALBUM_ID
 "ALBUM_ID"

So this new Zone Map has min/max details on each zone in the table for both the ARTIST_ID and ALBUM_ID columns.

The min/max ranges of a Zone Map provides an excellent visual representation of the clustering of the data. If I select Zone Map details of the ALBUM_ID column (see partial listing below):

SQL> select zone_id$, min_1_album_id, max_1_album_id, zone_rows$ from big_bowie_zm;

ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_ROWS$
 ---------- -------------- -------------- ----------
 3.8586E+11              1              2      66234
 3.8586E+11              5              6      56715
 3.8586E+11              7              7      76562
 3.8586E+11              7              8      76632
 3.8586E+11              8              9      76633
 3.8586E+11             21             22      75615
 3.8586E+11             29             29      75582
 3.8586E+11             31             32      75545
 3.8586E+11             35             36      75617
 3.8586E+11             43             44      75615
 ...

3.8586E+11             76             77      75615
 3.8586E+11             79             80      75615
 3.8586E+11             86             87      75616
 3.8586E+11             88             89      75618
 3.8586E+11             97             97      75771
 3.8586E+11            100            100      15871

134 rows selected.

As the data in the table is effectively ordered based on the ALBUM_ID column (and so is extremely well clustered in relation to this column), the min/max ranges for each zone is extremely narrow. Each zone basically only contains one or two different values of ALBUM_ID and so if I’m just after a specific ALBUM_ID value, the Zone Map is very effective in eliminating zones from having to be accessed. Just what we want.

However, if we look at the Zone Map details of the poorly clustered ARTIST_ID column (again just a partial listing):

SQL> select zone_id$, min_2_artist_id, max_2_artist_id, zone_rows$ from big_bowie_zm;

ZONE_ID$ MIN_2_ARTIST_ID MAX_2_ARTIST_ID ZONE_ROWS$
 ---------- --------------- --------------- ----------
 3.8586E+11            3661           98244      66234
 3.8586E+11               1          100000      56715
 3.8586E+11            5273           81834      76562
 3.8586E+11               1          100000      76632
 3.8586E+11               1          100000      76633
 3.8586E+11               1          100000      75615
 3.8586E+11            2383           77964      75582
 3.8586E+11               1          100000      75545
 3.8586E+11               1          100000      75617
 3.8586E+11               1          100000      75615
 ...

3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75615
 3.8586E+11               1          100000      75616
 3.8586E+11               1          100000      75618
 3.8586E+11            4848           80618      75771
 3.8586E+11           84130          100000      15871

134 rows selected.

We notice the ranges for most of the zones is extremely large, with many actually having a min value of 1 (the actual minimum) and a max of 100000 (the actual maximum). This is a worst case scenario as a specific required value could potentially reside in most of the zones, thereby  forcing Oracle to visit most zones and making the Zone Map totally ineffective.

If we run a query searching for a specific ARTIST_ID:

SQL> select * from big_bowie where artist_id=42;

100 rows selected.

Elapsed: 00:00:00.69

Execution Plan
 ----------------------------------------------------------
 Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
 | Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
 ----------------------------------------------------------------------------------------------------
 |   0 | SELECT STATEMENT                       |           |    99 |  9108 |  3291  (13)| 00:00:01 |
 |*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |    99 |  9108 |  3291  (13)| 00:00:01 |
 ----------------------------------------------------------------------------------------------------
 Predicate Information (identified by operation id):
 ---------------------------------------------------

1 - storage("ARTIST_ID"=42)
 filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
 BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_2_ARTIST_ID" > :1 OR
 zm."MAX_2_ARTIST_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ZM" zm WHERE
 zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND
 "ARTIST_ID"=42)

      
 Statistics
 ----------------------------------------------------------
 141  recursive calls
 0  db block gets
 101614  consistent gets
 0  physical reads
 0  redo size
 5190  bytes sent via SQL*Net to client
 618  bytes received via SQL*Net from client
 8  SQL*Net roundtrips to/from client
 0  sorts (memory)
 0  sorts (disk)
 100  rows processed

We notice we are forced to perform a very high number of consistent gets (101,614) when returning just 100 rows, much higher than the 2,364 consistent gets required to return a full 100,000 rows for a specific ALBUM_ID and not far from the 135,085 consistent gets when performing a full table scan.

We need to improve the performance of these queries based on the ARTIST_ID column …

Let’s drop this zone map:

SQL> drop materialized zonemap big_bowie_zm;

Materialized zonemap dropped.

and change the physical clustering of the data in the table so that the data is primarily now clustered in ARTIST_ID order:

 

SQL> alter table big_bowie add clustering by linear order(artist_id, album_id) with materialized zonemap;

Table altered.

So we have added a clustering attribute to this table (previously discussed here) and based a new Zone Map on this clustering at the same time.

SQL> select zonemap_name from dba_zonemaps where fact_table='BIG_BOWIE';

ZONEMAP_NAME
---------------
ZMAP$_BIG_BOWIE

SQL> select zonemap_name, pruning, with_clustering, invalid, stale, unusable
from dba_zonemaps where zonemap_name = 'ZMAP$_BIG_BOWIE';

ZONEMAP_NAME    PRUNING  WITH_CLUSTERING INVALID STALE   UNUSABLE
--------------- -------- --------------- ------- ------- --------
ZMAP$_BIG_BOWIE ENABLED  YES             NO      NO      NO

However, as we haven’t actually reorganized the table, the rows in the table are still clustered the same as before:

SQL> select zone_id$, min_2_album_id, max_2_album_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_2_ALBUM_ID MAX_2_ALBUM_ID ZONE_ROWS$
---------- -------------- -------------- ----------
3.8586E+11             43             44      75615
3.8586E+11              1              2      66234
3.8586E+11             81             82      75615
3.8586E+11             29             29      75582
3.8586E+11             50             50      75481
3.8586E+11             90             91      75484
3.8586E+11              5              6      56715
3.8586E+11              7              8      76632
3.8586E+11              8              9      76633
3.8586E+11             16             16      75481
...

3.8586E+11             44             44      75480
3.8586E+11             82             83      75616
3.8586E+11            100            100      15871
3.8586E+11             34             35      75576
3.8586E+11             14             15      75615
3.8586E+11             33             34      75616
3.8586E+11              3              5      75707

134 rows selected.

SQL> select zone_id$, min_1_artist_id, max_1_artist_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_1_ARTIST_ID MAX_1_ARTIST_ID ZONE_ROWS$
---------- --------------- --------------- ----------
3.8586E+11               1          100000      75545
3.8586E+11               1          100000      75616
3.8586E+11               1          100000      75617
3.8586E+11               1          100000      75911
3.8586E+11               1          100000      75616
3.8586E+11               1          100000      75616
3.8586E+11               1          100000      75615
3.8586E+11               1          100000      75616
3.8586E+11             132           75743      75612
3.8586E+11               1          100000      75615
...

3.8586E+11               1          100000      66296
3.8586E+11               1          100000      75615
3.8586E+11            2360           96960      75701
3.8586E+11               1          100000      75615
3.8586E+11               1          100000      75616
3.8586E+11           23432           98911      75480
3.8586E+11               1          100000      75791
3.8586E+11           21104           96583      75480

134 rows selected.

But if we now reorganise the table so that the clustering attribute can take effect:

SQL> alter table big_bowie move;

Table altered.

We notice the characteristics of the Zone Map has change dramatically. The previously well clustered ALBUM_ID now has a totally ineffective Zone Map with all the ranges effectively consisting of the full min/max values:

SQL> select zone_id$, min_2_album_id, max_2_album_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_2_ALBUM_ID MAX_2_ALBUM_ID ZONE_ROWS$
---------- -------------- -------------- ----------
3.9704E+11              1            142      21185
3.9704E+11              1            100       9452
3.9704E+11              1            100      76516
3.9704E+11              1            100      75501
3.9704E+11              1            100      75497
3.9704E+11              1            100      75501
3.9704E+11              1            100      75499
3.9704E+11              1            100      75504
3.9704E+11              1            100      75500
3.9704E+11              1            100      75501
...

3.9704E+11              1            100      75503
3.9704E+11              1            100      75498
3.9704E+11              1            100      75501
3.9704E+11              1            100      75501
3.9704E+11              1            100      75501
3.9704E+11              1            100      75501
3.9704E+11              1            100      75794

144 rows selected.

While the previously ineffective Zone Map on the ARTIST_ID column is now much more effective with significantly smaller min/max ranges for each zone:

SQL> select zone_id$, min_1_artist_id, max_1_artist_id, zone_rows$ from zmap$_big_bowie;

ZONE_ID$ MIN_1_ARTIST_ID MAX_1_ARTIST_ID ZONE_ROWS$
---------- --------------- --------------- ----------
3.9704E+11              67            1036      21185
3.9704E+11            2359            2453       9452
3.9704E+11            8341            9106      76516
3.9704E+11           18933           19688      75501
3.9704E+11           22708           23463      75497
3.9704E+11           26483           27238      75501
3.9704E+11           27238           27993      75499
3.9704E+11           33278           34033      75504
3.9704E+11           36674           40449      75500
3.9704E+11           38563           39318      75501
...

3.9704E+11           49888           50643      75503
3.9704E+11           62723           63478      75498
3.9704E+11           77824           78579      75501
3.9704E+11           82354           83109      75501
3.9704E+11           88394           89149      75501
3.9704E+11           93679           94434      75501
3.9704E+11           98211           98969      75794

144 rows selected.

The same query now runs so much faster as the Zone Map can eliminate almost all zones from being accessed:

SQL> select * from big_bowie where artist_id=42;

100 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |    99 |  9108 |  3291  (13)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |    99 |  9108 |  3291  (13)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ARTIST_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ARTIST_ID" > :1 OR
zm."MAX_1_ARTIST_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."ZMAP$_BIG_BOWIE" zm WHERE
zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND
"ARTIST_ID"=42)
Statistics
----------------------------------------------------------
187  recursive calls
0  db block gets
175  consistent gets
0  physical reads
0  redo size
5190  bytes sent via SQL*Net to client
618  bytes received via SQL*Net from client
8  SQL*Net roundtrips to/from client
9  sorts (memory)
0  sorts (disk)
100  rows processed

Consistent gets has reduced dramatically down to just 175 from the previously massive 101,614.

As is common with changing the clustering of data, what improves one thing makes something else significantly worse. The previously efficient accesses based on the ALBUM_ID column is now nowhere near as efficient as before:

SQL> select * from big_bowie where album_id = 42;

100000 rows selected.

Elapsed: 00:00:01.27

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ALBUM_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_2_ALBUM_ID" > :1 OR
zm."MAX_2_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."ZMAP$_BIG_BOWIE" zm WHERE
zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42)

Statistics
----------------------------------------------------------
187  recursive calls
0  db block gets
141568  consistent gets
0  physical reads
0  redo size
4399566  bytes sent via SQL*Net to client
73878  bytes received via SQL*Net from client
6668  SQL*Net roundtrips to/from client
9  sorts (memory)
0  sorts (disk)
100000  rows processed

We now have to perform a whopping 141,568 consistent gets up from the previous 2,364 consistent gets.

So Zone Maps, like database indexes and Exadata Storage Indexes, can be extremely beneficial in reducing I/O but their effectiveness is very much dependant on the clustering of the underlining data.

12.1.0.2 Introduction to Zone Maps Part II (Changes) October 30, 2014

Posted by Richard Foote in 12c, Exadata, Oracle Indexes, Zone Maps.
1 comment so far

In Part I, I discussed how Zone Maps are new index like structures, similar to Exadata Storage Indexes, that enables the “pruning” of disk blocks during accesses of the table by storing the min and max values of selected columns for each “zone” of a table. A Zone being a range of contiguous (8M) blocks.

I showed how a Zone Map was relatively tiny but very effective in reducing the number of consistent gets for a well clustered column (ALBUM_ID).

In this post, we’re going to continue with the demo and look at what happens when we update data in the table with a Zone Map in place.

So lets update the ALBUM_ID column (which currently has a Zone Map defined) for a few rows. The value of ALBUM_ID was previously 1 for all these rows (the full range of values is currently between 1 and 100) but we’re going to update them to 142:

SQL> update big_bowie set album_id=142 where id between 1 and 100;

100 rows updated.

SQL> commit;

Commit complete.

 

So the maximum value of ALBUM_ID is now 142, not 100. If we look at the maximum value as currently listed in the Zone Map:

 

SQL> select max(max_1_album_id) from  big_bowie_album_id_zm;

MAX(MAX_1_ALBUM_ID)
-------------------
100

 

We notice the maximum is still defined as being 100. So the update on the table has not actually updated the contents of the Zone Map. So this is a big difference between Zone Maps and conventional indexes, indexes are automatically updated during DML operations, Zone Maps are not (unless the REFRESH ON COMMIT option is specified).

If we look at the state of Zone Map entries that have a minimum of 1 (the previous values of ALBUM_ID before the update):

SQL> select * from big_bowie_album_id_zm where min_1_album_id = 1;

ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_LEVEL$ ZONE_STATE$ ZONE_ROWS$
---------- -------------- -------------- ----------- ----------- ----------
3.8586E+11              1              2           0           0      66234
3.8586E+11              1              2           0           1      65787
3.8586E+11              1              2           0           0      66223

 

We notice that one of the entries has a status of 1, meaning that a specific zone has been marked as stale. However, all the other zones are still OK.

If we look at the status of the overall Zone Map:

SQL> select zonemap_name, pruning, refresh_mode, invalid, stale, unusable
from dba_zonemaps where zonemap_name='BIG_BOWIE_ALBUM_ID_ZM';

ZONEMAP_NAME              PRUNING  REFRESH_MODE      INVALID STALE   UNUSABLE
------------------------- -------- ----------------- ------- ------- --------
BIG_BOWIE_ALBUM_ID_ZM     ENABLED  LOAD DATAMOVEMENT NO      NO      NO

 

We notice that the Zone Map is still “hunky dory” after the update.

If we now re-run the query we ran in Part I:

 

SQL> select * from big_bowie where album_id = 42;

100000 rows selected.

Elapsed: 00:00:00.29

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ALBUM_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ALBUM_ID" > :1 OR
zm."MAX_1_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" zm
WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42)
Statistics
----------------------------------------------------------
141  recursive calls
0  db block gets
3238  consistent gets
0  physical reads
0  redo size
3130019  bytes sent via SQL*Net to client
761  bytes received via SQL*Net from client
21  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
100000  rows processed

 

We see the Zone Map was still used by the CBO. The number of consistent gets has increased (up from 2364 to 3238) as we now have to additional access all the blocks associated with this stale zone, but it’s still more efficient that reading all the blocks from the entire table.

If we want to remove the stale zone entries, we can refresh the Zone Map or rebuild it (for ON DEMAND refresh):

 

SQL> alter materialized zonemap big_bowie_album_id_zm rebuild;

Materialized zonemap altered.

 

If we now look at the Zone Map entry:

 

SQL> select * from big_bowie_album_id_zm where min_1_album_id = 1;

ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_LEVEL$ ZONE_STATE$ ZONE_ROWS$
---------- -------------- -------------- ----------- ----------- ----------
3.8586E+11              1              2           0           0      66234
3.8586E+11              1            142           0           0      65787
3.8586E+11              1              2           0           0      66223

 

We see that the entry is no longer stale and now correctly reflects the actual maximum value within the zone (142).

If we now re-run the query:

SQL> select * from big_bowie where album_id = 42;

100000 rows selected.

Elapsed: 00:00:00.30

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ALBUM_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ALBUM_ID" > :1 OR
zm."MAX_1_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" zm
WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND "ALBUM_ID"=42)
Statistics
----------------------------------------------------------
141  recursive calls
0  db block gets
3238  consistent gets
0  physical reads
0  redo size
3130019  bytes sent via SQL*Net to client
761  bytes received via SQL*Net from client
21  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
100000  rows processed

 

We notice nothing has appreciably changed, the Zone Map is still being used but the number of consistent gets remains the same as before. Why haven’t we returned back to our previous 2364 consistent gets ?

Well, as the range of possible values within the updated zone is now between 1 and 142, the required value of 42 could potentially be found within this zone and so still needs to be accessed just in case. We know that the value of 42 doesn’t exist within this zone, but Oracle has no way of knowing this based on the possible 1 to 142 range.

Hence Zone Maps work best when the data is well clustered and the Min/Max ranges of each zone can be used to limit which zones need to be accessed. If the data was not well clustered and the values within each zone mostly had ranges between the min and max values, then Oracle wouldn’t be able to effectively prune many/any zone and the Zone Map would be useless.

As we’ll see in Part III 🙂

12.1.0.2 Introduction to Zone Maps Part I (Map Of The Problematique) September 3, 2014

Posted by Richard Foote in Uncategorized.
11 comments

Zone Maps are new index-like structures that enables the “pruning” of disk blocks during accesses of the table by storing the min and max values of selected columns for each “zone” of a table. A zone is simply a range of contiguous blocks within a table. Zone Maps are similar in concept to Exadata storage indexes but are physical objects that can be explicitly controlled and maintained. Additionally, in a similar manner to Bitmap-Join indexes, they can also be defined on a table with an outer join to other tables, with the Zone Map attributes based on columns from these joined tables. They can also be created as part of the new Attribute Clustering capabilities I introduced in my last post.

Before getting too excited though, Zone Maps unfortunately requires the Partitioning Option AND either Exadata or SuperCluster. If you’re still interested, read on 🙂

I’ll start by creating and populating my BIG_BOWIE table. For this specific post, the key thing to note here is that the ALBUM_ID column is very well clustered.

SQL> create table big_bowie (id number not null, album_id number not null, artist_id number not null, format_id number,

release_date date, total_sales number, description varchar2(100));

Table created.

SQL> create sequence bowie_seq order;

Sequence created.

SQL> create or replace procedure pop_big_bowie as
  begin
    for v_album_id in 1..100 loop
        for v_artist_id in 1..100000 loop
            insert into big_bowie values (bowie_seq.nextval, v_album_id, v_artist_id, ceil(dbms_random.value(0,5)) * 2,
     trunc(sysdate-ceil(dbms_random.value(0,10000))), ceil(dbms_random.value(0,500000)), 'THE RISE AND FALL OF ZIGGY

STARDUST AND THE SPIDERS FROM MARS');
        end loop;
          commit;
    end loop;
    commit;
end;
/

Procedure created.

SQL> exec pop_big_bowie

PL/SQL procedure successfully completed.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'BIG_BOWIE', method_opt=>'FOR ALL COLUMNS SIZE 1');

 

 

I’ll next create a standard b-tree index on the ALBUM_ID column:

SQL> create index dwh_bowie_album_id_i on dwh_bowie(album_id);

Index created.

If I run the following query looking for just specific value of ALBUM_ID (1% of the data):

SQL> select * from big_bowie where album_id=42;

100000 rows selected.

Elapsed: 00:00:00.29

Execution Plan
----------------------------------------------------------
Plan hash value: 1830705794

------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                 | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                      |   100K|  8984K|  1554   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BIG_BOWIE            |   100K|  8984K|  1554   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BIG_BOWIE_ALBUM_ID_I |   100K|       |   201   (2)| 00:00:01 |
------------------------------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ALBUM_ID"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
       1590  consistent gets
          0  physical reads
          0  redo size
    9689464  bytes sent via SQL*Net to client
        760  bytes received via SQL*Net from client
         21  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100000  rows processed

The CBO used the index because of the excellent clustering of the data and performed a relatively low 1590 consistent gets (note arraysize was set to 5000).

If I make the index invisible and re-run the query a number of times:

SQL> alter index big_bowie_album_id_i invisible;

Index altered.

SQL> select * from big_bowie where album_id=42;

100000 rows selected.

Elapsed: 00:00:00.29

Execution Plan
----------------------------------------------------------
Plan hash value: 469213804

---------------------------------------------------------------------------------------
| Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
---------------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage("ALBUM_ID"=42)
       filter("ALBUM_ID"=42)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
     135085  consistent gets
     135068  physical reads
          0  redo size
    3130019  bytes sent via SQL*Net to client
        760  bytes received via SQL*Net from client
         21  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
     100000  rows processed

 

We notice execution times are almost identical despite the Full Table Scan (FTS), as Exadata Storage Indexes (SI) have kicked in. With the data being so well clustered, the SI has been able to very effectively “prune” the physical blocks that need to be accessed. This is clearly highlighted by looking at the sessions statistics of a session that performed one of these queries:

SQL> select name , value/1024/1024 MB
from v$statname n,  v$mystat s
where n.statistic# = s.statistic# and
n.name in ('cell physical IO interconnect bytes returned by smart scan', 'cell physical IO bytes saved by storage index'); 

NAME                                                                     MB
---------------------------------------------------------------- ----------
cell physical IO bytes saved by storage index                    1038.32813
cell physical IO interconnect bytes returned by smart scan       9.56008911

The vast majority of the table has actually been skipped as a result of the storage indexes.

I’m now going to create a very basic version of a Zone Map, one based just on this ALBUM_ID column:

SQL> create materialized zonemap big_bowie_album_id_zm on big_bowie(album_id);

Materialized zonemap created.

Zone Maps are implemented in a similar manner to Materialized Views and are separate table structures that can be refreshed in a similar manner to how Materialized Views are refreshed. This is an important point and a key difference between Zone Maps and conventional indexes in that depending on the refresh property of the Zone Map, may not be updated during DML operations. The default refresh property is REFRESH ON LOAD DATA MOVEMENT, which means the Zone Map in part becomes “stale” after DMLs.

Zone Maps basically store the min and max of the corresponding column(s) within each “zone” of the table. If we have a look at the contents of the Zone Map (below listing in part):

SQL> select zone_id$, min_1_album_id, max_1_album_id, zone_rows$ 

from big_bowie_album_id_zm;

       ZONE_ID$ MIN_1_ALBUM_ID MAX_1_ALBUM_ID ZONE_ROWS$
--------------- -------------- -------------- ----------
   385855025152              1              2      66234
   385855029250              5              6      56715
   385855029251              7              7      76562
   385855025155              7              8      76632
   385855004675              8              9      76633
   385855025161             21             22      75615
   385855004684             29             29      75582
   385855004685             31             32      75545
   385855004687             35             36      75617
   385855029267             43             44      75615
   385855029270             50             50      75481
   385855029275             61             62      75616
   385855025179             62             63      75530
   385855029284             81             82      75615
   385855029285             84             84      75480
   385855004710             87             88      75616
   385855004711             90             91      75484
   385855029293            100            100      75799
   385855029254             13             14      75615
   385855029255             16             16      75481
   385855004681             22             22      75480
   385855004682             24             25      75616


...

   385855025184             73             74      75615
   385855004705             76             77      75615
   385855029283             79             80      75615
   385855029286             86             87      75616
   385855029287             88             89      75618
   385855004714             97             97      75771
   385855029295            100            100      15871



134 rows selected.

A few things to note here. Firstly, the amount of data stored for the Zone Map is tiny. Each zone corresponds to approx. 8M of storage or in this case roughly 75,500 rows.  This is another key difference between Zone Maps and conventional B-Tree indexes. There’s an index entry per referenced row of the table whereas a Zone Map only has one entry per zone of the table. Note also, because the ALBUM_ID data is so well clustered, each zone has a very narrow min/max range of ALBUM_ID values. This means the Zone Map can be extremely effective in “pruning” storage accesses as Oracle can very easily eliminate the majority zones from containing data of interest.

To highlight just how tiny this Zone Map is:

SQL> select segment_name, segment_type, bytes from dba_segments where segment_name like 'BIG_BOWIE%';

SEGMENT_NAME              SEGMENT_TYPE     BLOCKS      BYTES
------------------------- ------------ ---------- ----------
BIG_BOWIE                 TABLE            139264 1140850688
BIG_BOWIE_ALBUM_ID_I      INDEX             20480  167772160
BIG_BOWIE_ALBUM_ID_ZM     TABLE                 8      65536

 

We only need one 64K extent for the Zone Map whereas the Table/Index are much larger. If fact, the 134 rows can actually all fit in just the one 8K data block !!

Two new data dictionary views provides details on our Zone Maps:

SQL> select zonemap_name, pruning, refresh_mode, invalid, stale, unusable  
from dba_zonemaps where zonemap_name='BIG_BOWIE_ALBUM_ID_ZM';

ZONEMAP_NAME              PRUNING  REFRESH_MODE      INVALID STALE   UNUSABLE
------------------------- -------- ----------------- ------- ------- --------
BIG_BOWIE_ALBUM_ID_ZM     ENABLED  LOAD DATAMOVEMENT NO      NO      NO

So the current status of the Zone Map can be determined in DBA_ZONEMAPS.

 

SQL> select measure, position_in_select, agg_function, agg_column_name
from dba_zonemap_measures where zonemap_name='BIG_BOWIE_ALBUM_ID_ZM';

MEASURE                        POSITION_IN_SELECT AGG_FUNCTION  AGG_COLUMN_NAME
------------------------------ ------------------ ------------- ---------------
"BOWIE"."BIG_BOWIE"."ALBUM_ID"                  3 MAX           MAX_1_ALBUM_ID
"BOWIE"."BIG_BOWIE"."ALBUM_ID"                  2 MIN           MIN_1_ALBUM_ID

Additionally, details of the columns within the Zone Map can be determined in DBA_ZONEMAP_MEASURES.

If we now re-run the same query on the BIG_BOWIE table:

SQL> select * from big_bowie where album_id=42;

100000 rows selected.

Elapsed: 00:00:00.28

Execution Plan
----------------------------------------------------------
Plan hash value: 1980960934

----------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |   100K|  8984K|  3269  (12)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL WITH ZONEMAP| BIG_BOWIE |   100K|  8984K|  3269  (12)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("ALBUM_ID"=42)
filter(SYS_ZMAP_FILTER('/* ZM_PRUNING */ SELECT "ZONE_ID$", CASE WHEN
BITAND(zm."ZONE_STATE$",1)=1 THEN 1 ELSE CASE WHEN (zm."MIN_1_ALBUM_ID" > :1 OR
zm."MAX_1_ALBUM_ID" < :2) THEN 3 ELSE 2 END END FROM "BOWIE"."BIG_BOWIE_ALBUM_ID_ZM" zm
WHERE zm."ZONE_LEVEL$"=0 ORDER BY zm."ZONE_ID$"',SYS_OP_ZONE_ID(ROWID),42,42)<3 AND
"ALBUM_ID"=42)
Statistics
----------------------------------------------------------
141  recursive calls
0  db block gets
2364  consistent gets
0  physical reads
0  redo size
3130019  bytes sent via SQL*Net to client
760  bytes received via SQL*Net from client
21  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
100000  rows processed

We notice the CBO has used the Zone Map during the FTS (see the TABLE ACCESS STORAGE FULL WITH ZONEMAP execution step and filter predicate information). It’s performing some additional consistent gets over the database index (2364 vs. 1590) but it’s not too bad, nowhere near the 139,000 odd blocks in the table, with performance comparable to before. The extra consistent gets are necessary as any possible zones need to be read in their entirety, whereas a B-Tree index can point to just the specific table blocks that contain data of interest.

So the Zone Map has provided a “good enough” outcome here. Meanwhile, it has required very minimal storage, we can control how the Zone Map is maintained (to be discussed later) and we can guarantee that the Zone Map object will always exist and be present, even after bouncing either the database or Exadata storage servers.

In Part II, I’ll look at what happens when I perform some DML on the base table.

12.1.0.2 Introduction to Attribute Clustering (The Division Bell) August 26, 2014

Posted by Richard Foote in 12c, Attribute Clustering, Clustering Factor, Oracle Indexes.
6 comments

One of the really cool new features introduced in 12.1.0.2 is Attribute Clustering. This new table based attribute allows you to very easily cluster data in close physical proximity based on the content of specific columns.

As I’ve discussed many times, indexes love table data that is physically clustered in a similar manner to the index as it can significantly improve the efficiency of such indexes. A low Clustering Factor (CF) makes an index more viable and is one of the more important considerations in CBO calculations.

But not only database indexes benefit from well cluster data. Other index structures such as Exadata Storage Indexes and the new Zone Maps (to be discussed in future articles) all benefit from well clustered data. Additionally, compression is likely to be much more effective with data that is well clustered and this in turns also impacts the efficiency of In-memory data (again, to be discussed in future articles).

So having the capability to now easily cluster data in regular heap tables has potentially many benefits.

To illustrate, I’m first going to create a table with data that is not well clustered at all. The CODE column has data that is basically evenly distributed throughout the whole table structure:

SQL> create table ziggy (id number, code number, name varchar2(30));

Table created.

SQL> insert into ziggy select rownum, mod(rownum,100), 'DAVID BOWIE' from dual connect by level >= 2000000;

2000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY', estimate_percent=>null,
method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

I’ll next create an index on this CODE column and check out its default CF:

SQL> create index ziggy_code_i on ziggy(code);

Index created.

SQL> select index_name, clustering_factor, num_rows from user_indexes
where index_name='ZIGGY_CODE_I';

INDEX_NAME           CLUSTERING_FACTOR   NUM_ROWS
-------------------- ----------------- ----------
ZIGGY_CODE_I                    703133    2000000

For a table with 2 million rows, a CF of some 703,133 is very high and the index is going to be very inefficient when retrieving high numbers of rows.

Let’s run a query that returns a specific CODE value, approx. 1% of all the data (note I’ve set a large arraysize to minimize unnecessary fetches and resultant consistent  gets):

SQL> set arraysize 5000

SQL> select * from ziggy where code = 42;

20000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2421001569

-----------------------------------------------------------------------------------
| Id  | Operation                 | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |       | 20000 |   390K|   383  (17)| 00:00:01 |
|*  1 |  TABLE ACCESS STORAGE FULL| ZIGGY | 20000 |   390K|   383  (17)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - storage("CODE"=42)
filter("CODE"=42)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
15212  consistent gets
0  physical reads
0  redo size
211208  bytes sent via SQL*Net to client
585  bytes received via SQL*Net from client
5  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
20000  rows processed

The CBO has chosen a Full Table Scan and has decided to not use the index. If we hint the SQL:

SQL> select /*+ index (ziggy, ziggy_code_i) */ * from ziggy where code = 42;

20000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3294205578

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              | 20000 |   390K|  7081   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ZIGGY        | 20000 |   390K|  7081   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | ZIGGY_CODE_I | 20000 |       |    43   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
7081  consistent gets
41  physical reads
0  redo size
511195  bytes sent via SQL*Net to client
585  bytes received via SQL*Net from client
5  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
20000  rows processed

At a cost of 7081, the index is way more expensive than the 383 cost for the FTS. The poor clustering of the CODE data within the table has made the index non viable.

Let’s now create another table, but this one with a clustering attribute set on the CODE column:

SQL> create table ziggy2 (id number, code number, name varchar2(30))

clustering by linear order (code) without materialized zonemap;

Table created.

The CLUSTERING BY LINEAR ORDER clause orders data in the table based on the specified columns, in this case the CODE column. Up to 10 columns can be included using this particular technique (there are other attribute clustering options which I’ll again cover in later articles, yes I’ll be writing quite a few new articles) 🙂 WITHOUT MATERIALIZED ZONEMAP means I don’t want to create these new Zone Maps index structures at this stage which could potentially reduce the amount of table storage needed to be accessed (again, I’ll discuss these at another time).

You must use a direct path insert to make use of attribute clustering (or reorganize the table as we’ll see).

So lets insert the exact same data into this new ZIGGY2 table via a straight direct path sub-select:

SQL> insert /*+ append */ into ziggy2 select * from ziggy;

2000000 rows created.

SQL> SELECT * FROM TABLE(dbms_xplan.display_cursor);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------
SQL_ID  0arqdyc9vznpg, child number 0
-------------------------------------
insert /*+ append */ into ziggy2 select * from ziggy

Plan hash value: 1975011999

--------------------------------------------------------------------------------------------------
| Id  | Operation                        | Name  | Rows  | Bytes |TempSpc| Cost(%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT                 |       |       |       |       | 10596 (100)|          |
|   1 |  LOAD AS SELECT                  |       |       |       |       |            |          |
|   2 |   OPTIMIZER STATISTICS GATHERING |       |  2000K|    38M|       | 10596   (3)| 00:00:01 |
|   3 |    SORT ORDER BY                 |       |  2000K|    38M|    61M| 10596   (3)| 00:00:01 |
|   4 |     TABLE ACCESS STORAGE FULL    | ZIGGY |  2000K|    38M|       |   376  (16)| 00:00:01 |
--------------------------------------------------------------------------------------------------

SQL> commit;

Commit complete.

Notice the SORT ORDER BY step in the insert execution plan. This implicitly sorts the incoming data in CODE order to satisfy the attribute clustering requirement.

If we create an index on this table and examine the CF:

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'ZIGGY2',
estimate_percent=>null, method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

SQL> select index_name, clustering_factor, num_rows
from user_indexes where index_name='ZIGGY2_CODE_I';

INDEX_NAME           CLUSTERING_FACTOR   NUM_ROWS
-------------------- ----------------- ----------
ZIGGY2_CODE_I                     7072    2000000

We notice the default CF is indeed significantly lower at just 7072 than the previous value of 703133.

If we now run the equivalent query as before on this table:

SQL> select * from ziggy2 where code=42;

20000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 16801974

-----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name          | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |               | 20000 |   390K|   114   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ZIGGY2        | 20000 |   390K|   114   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | ZIGGY2_CODE_I | 20000 |       |    43   (3)| 00:00:01 |
-----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
121  consistent gets
41  physical reads
0  redo size
511195  bytes sent via SQL*Net to client
585  bytes received via SQL*Net from client
5  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
20000  rows processed

We notice the CBO has now decided to use the index. This is due to the cost of the index based execution plan being just 114, significantly lower than the previous index cost of 7081 or the FTS at a cost of 383. Just as importantly, the resultant number of consistent gets has also significantly reduced to just 121, significantly less than the previous 7081 consistent gets when using the index. So the index is indeed much more efficient to use and the CBO costs for this is just about spot on. The end result is that performance has improved.

So how to now likewise improve the performance of the first table? Simple add the attribute clustering and reorganize the table:

SQL> alter table ziggy add clustering by linear order(code) without materialized zonemap;

Table altered.

SQL> alter table ziggy move;

Table altered.

SQL> alter index ziggy_code_i rebuild;

Index altered.

SQL> select index_name, clustering_factor, num_rows from user_indexes where index_name='ZIGGY_CODE_I';

INDEX_NAME      CLUSTERING_FACTOR   NUM_ROWS
--------------- ----------------- ----------
ZIGGY_CODE_I                 7134    2000000

So as expected, the CF has likewise reduced. So if we now run the query:

SQL> select * from ziggy where code=42;

20000 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3294205578

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              | 20000 |   390K|   115   (1)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ZIGGY        | 20000 |   390K|   115   (1)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | ZIGGY_CODE_I | 20000 |       |    43   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
121  consistent gets
0  physical reads
0  redo size
511195  bytes sent via SQL*Net to client
585  bytes received via SQL*Net from client
5  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
20000  rows processed

The query likewise uses the index and with far less consistent gets and performance is significantly better.

So attribute clustering provides a nice mechanism by which data in a heap table (or importantly within a partition or sub-partition) can be physically clustered in a manner that can be potentially beneficial in various scenarios. Of course, the decision on how to actually cluster the data and on which columns is somewhat crucial 🙂