jump to navigation

Next Round Of ANZ “Let’s Talk Database” Events (July/August 2016) June 20, 2016

Posted by Richard Foote in Let's Talk Database, Oracle Indexes.
5 comments

I’ll be presenting the next round of “Let’s Talk Database” events around Australia and NZ this winter in July/August 2016. These are free events but due to limited places have often “sold out” in the past, so booking early is recommended to avoid disappointment.

All events run between 9:00am – 12:30pm and are followed by a networking lunch.

We always have a fun time at these events and feedback suggest everyone learns something, so I hope if you’re lucky enough to be in the ANZ region, you can make it to your local event:)

Currently, the confirmed events and topics are as follows:

  • Sydney – Wednesday, 20 July 2016: Let’s Talk Oracle Database: Intro To CBO Costings, Database Futures. To register, email: mitch.foster@oracle.com
  • Melbourne – Thursday, 21 July 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: johanne.sergeant@oracle.com
  • Canberra – Thursday, 28 July 2016: Let’s Talk Database: Oracle Database 12c – Built for Data Warehousing. To register, email: richard.foote@oracle.com
  • Wellington –  Wednesday, 10 August 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: tanya.fea@oracle.com
  • Auckland – Thursday, 11 August 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: tanya.fea@oracle.com
  • Perth – Wednesday, 17 August 2016: Let’s Talk Database: Intro To CBO Costings, Database Futures. To register, email: dennis.ward@oracle.com
  • Adelaide – Thursday, 18 August 2016: Let’s Talk Database: Oracle Database 12c – Built for Data Warehousing. To register, email: brodie.james@oracle.com
  • Brisbane: Wednesday, 24 August 2016: Let’s Talk Oracle Database: Intro To CBO Costings, Database Futures. To register, email: gordon.laing@oracle.com

 

Details of sessions.

Performance Diagnostics – How To Correctly Use AWR Reports

I often get called in by customers to determine and address the root cause of database performance issues. Depending on the issue, a request for a simple Automatic Workload Repository (AWR) report is often sufficient to accurately diagnose the root problem(s). However, many DBAs find AWR reports daunting as they contain so much database diagnostic information, they don’t even know where to begin to find the truly useful information.

In this session, I begin by discussing the basic concepts of response times, DB times, the Oracle Wait Interface and how to focus on what is truly important. I then look at how to best read an AWR report to quickly go to the most relevant sections that detail any specific issues. I also discuss a general tuning and diagnostic methodology that ensures one can quickly determine whether an AWR report will indeed be sufficient and how to accurately and consistently use the AWR report to pinpoint and determine root causes for global database performance issues. We’ll go through a number of actual “real-life” examples that highlight various performance issues and how one accurately determines the actual root issues through an appropriate AWR report.

Time permitting, we’ll go through some other useful diagnostic tools such as the AWR Warehouse, Automatic Database Diagnostic Monitor (ADDM), Active Session History (ASH)  and SQL Extended Tracing and scenarios when they can prove to be invaluable in addressing database performance issues.

 

Introduction Into The CBO (Why Is The CBO doing a Full Table Scan?)

A common issue is confusion on why the Oracle Cost Based Optimizer (CBO) has made the decision the use a full table scan or use that index and not the one that appears more appropriate. The costing figures that the CBO derives and displays in execution plans are often viewed as cryptic, which is unfortunate as these costings are key to CBO decisions. Often the attempt to resolve such issues is to set inappropriate database parameters that introduces as many issues as it tries to resolve. This session looks under the covers of the CBO to see how it precisely derives its various costings and hence how to correctly troubleshoot and resolve inefficient queries when they arise. Bring a calculator, it might come in handy:)

Database Futures – Exadata Secret Sauce and Software on Silicon

Oracle has recently released the new generation of the Exadata machine, the X6-2. Not only is it bigger and faster but also has a host of features and capabilities that is unique to the Exadata storage platform including Smart Scans, Storage Indexes, Exafusion, Smart Fusion Block Transfer, Dual Format Columnar Flash Cache, I/O Latency Capping, Hybrid Columnar Compression, Sparse Snapshots and In-Memory Database Fault Tolerance. All these unique features and their enhancements with the new X6-2 will be discussed. Additionally, Oracle’s Software-in-Silicon features extend the concepts of co-engineering hardware and software inherent in engineered systems such as the Exadata. Embedding software which has traditionally sat in the database layer into the hardware layer can yield both performance and security beyond that available of a traditional CPU. This presentation also covers how Oracle Database 12c can leverage Software-in-Silicon features, where the Oracle Database is heading in the future and why you should care.

 

Oracle Database 12c – Built for Data Warehousing

The Oracle Database is the leading database in market but it might come as a surprise to some that it’is also the leading database with respect to Data Warehousing in terms of both sales and analyst rankings. The focus today is a detailed look at all the special database capabilities that makes the Oracle Database the perfect platform for Data Warehouse type applications.

We explore the overall Data Warehousing Information Management architecture and how it relates to both data warehouses and data reservoirs. We then deep dive into standard database features such as Materialized Views, Dimensions, Bitmap Indexes, Query Re-Write, JSON in the database, Partitioning, Parallel Processing, Compression, Attribute Clustering and SQL for analysis, reporting, aggregation and pattern matching.

We then discuss more advanced Oracle database capabilities such as Data Mining, Predictive Analytics, R in the database, Database In-Memory, Big Data SQL, Spatial & Graph and the specific engineered systems capabilities that makes Oracle such an effective Data Warehouse platform, especially when combined with the enterprise capabilities already associated with the Oracle Database such as security, high availability, scalability and manageability.

Storing Date Values As Numbers (The Numbers) June 1, 2016

Posted by Richard Foote in 12c, CBO, Histograms, Oracle Indexes, Storing Dates As Numbers.
4 comments

In my last couple of posts, I’ve been discussing how storing date data in a character based column is a really really bad idea.

In a follow-up question, I was asked if storing dates in NUMBER format was a better option. The answer is that it’s probably an improvement from storing dates as strings but it’s still a really really bad idea. Storing dates in DATE format is easily the best option as is storing any data in its native data type.

In this post, I’ll highlight a few of the classic issues with storing dates in basic number format as well as showing you some of the calculations on the CBO cardinality estimates.

As usual, the demo starts with a basic little table that I’ll populate with date data stored in a NUMBER column (ZIGGY_DATE):

SQL> create table ziggy (id number, code number, ziggy_date number);
    
Table created.

SQL> insert into ziggy select rownum, mod(rownum,1000), 
to_number(to_char(sysdate-mod(rownum,10000), 'YYYYMMDD')) 
from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

We’ll now collect statistics on the table:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='ZIGGY';

COLUMN_NAME NUM_DISTINCT    DENSITY HISTOGRAM       HID VIR
----------- ------------ ---------- --------------- --- ---
ZIGGY_DATE         10000      .0001 NONE            NO  NO
CODE                1000       .001 NONE            NO  NO
ID               1000000    .000001 NONE            NO  NO

So the ZIGGY_DATE column has 10,000 distinct dates (with 100 rows per distinct date), with a column density of 1/10000 = 0.0001.

Let’s now create a standard B-Tree index on the ZIGGY_DATE column:

SQL> create index ziggy_date_i on ziggy(ziggy_date);
                  
Index created.

If we look a sample of the data in the column and the min/max date ranges:

SQL> select * from ziggy where rownum <11;
        
        ID       CODE ZIGGY_DATE
---------- ---------- ----------
       776        776   20140412
       777        777   20140411
       778        778   20140410
       779        779   20140409
       780        780   20140408
       781        781   20140407
       782        782   20140406
       783        783   20140405
       784        784   20140404
       785        785   20140403

SQL> select min(ziggy_date) min, max(ziggy_date) max from ziggy;

       MIN        MAX
---------- ----------
  19890110   20160527

We see that all the data in the ZIGGY_DATE column are just number representations of dates, with a range between 10 Jan 1989 and 27 May 2016.

Note there are actually 10,000 days between the dates but the CBO would estimate a range of  270,417 possible days (20160527 – 19890110 = 270,417). The CBO has no idea that the “numbers” within the column are all dates and that there are ranges of values in which data is relatively popular (e.g. between say 20160101 and 20160131) and ranges of values in which data is relatively unpopular (e.g. say between 20154242 and 20159999).

Although not as bad as the range of possible unpopular values found within a character data type as I discussed previously when storing date data as a string, there is still enough data skew when storing dates as numbers to be problematic to the CBO.

If we select just one date with an equality predicate:

SQL> select * from ziggy where ziggy_date = 20150613;
                 
100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2700236208

----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name         | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |              |  100 |  1500 |     103 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY        |  100 |  1500 |     103 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_DATE_I |  100 |       |       3 (0) | 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ZIGGY_DATE"=20150613)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
110 consistent gets
0 physical reads
0 redo size
3883 bytes sent via SQL*Net to client
618 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed

The CBO gets things spot on, correctly estimating 100 rows to be returned, as the CBO knows there are only 10,000 distinct values of which only one of those values is being selected.

Selectivity is basically the density of the column = 1/10000 = 0.0001, so the estimated cardinality is 0.0001 x 1M rows = 100 rows. Perfect.

However, if we perform a range based query as follows:

SQL> select * from ziggy where ziggy_date between 20151010 and 20151111;
     
3300 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2700236208

----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name         | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |              |  573 |  8595 |     580 (1) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY        |  573 |  8595 |     580 (1) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_DATE_I |  573 |       |       4 (0) | 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ZIGGY_DATE">=20151010 AND "ZIGGY_DATE"<=20151111)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3531 consistent gets
0 physical reads
0 redo size
108973 bytes sent via SQL*Net to client
2961 bytes received via SQL*Net from client
221 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3300 rows processed

The CBO has got things somewhat incorrect in this example and has underestimated the expect number of rows (573 rows vs. the 3,300 rows actually returned).

The actual number of days between these dates is 33 so the actual ratio of data returned is 33/10000 x 1M rows = 3,300 rows. This is a range of “numbers” that overall covers a relatively “popular” range of  date values.

However Oracle is estimating a range of some 20151111 – 20151010 = 101 days between these dates. As the total range of possible days 20160527-19890110 = 270,417, the estimated ratio of returned rows is 101/270417 plus 2 x selectivity of a day for the implicit 2 equality conditions (as a between is effectively >= and <=). The selectivity of one day is just the density of the column, 0.0001 as illustrated in the previous query.

Therefore, the query selectivity is derived as being (101/270417) + (2 x 0.0001) = 0.000573 when multiplied by 1M rows = 573 rows as estimated by the CBO.

So the CBO is rather significantly *under* estimating the rows to be returned which could result in a sub-optimal execution plan (such as the inappropriate use of an index range scan as in this example, noting the poor clustering of the data).

If we now look at another range scan below:

SQL> select * from ziggy where ziggy_date between 20151225 and 20160101;
    
800 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2421001569

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows  | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       | 33023 |  483K |    810 (15) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | ZIGGY | 33023 |  483K |    810 (15) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ZIGGY_DATE">=20151225 AND "ZIGGY_DATE"<=20160101)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2824 consistent gets
0 physical reads
0 redo size
23850 bytes sent via SQL*Net to client
1135 bytes received via SQL*Net from client
55 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
800 rows processed

The actual number of days between these dates is only 8 so the actual ratio of data returned is 8/10000 x 1M rows = 800 rows. This is a range of “numbers” that overall covers a relatively “unpopular” range of date values.

However Oracle is estimating a range of some 20160101 – 20151225 = 8876 days between these dates. As the total range of possible days is 20160527-19890110 = 270,417, the estimated ratio of returned rows is 8876/270417 plus 2 x the selectivity of a single day again for the 2 implicit equality conditions.

Therefore, the query selectivity is derived as being (8876/270417) + (2 x 0.0001) = 0.033023 when multiplied by 1M rows = 33,023 rows as estimated by the CBO.

So the CBO is rather significantly *over* estimating the rows to be returned which could again result in a sub-optimal execution plan (or the inappropriate use of a Full Table Scan in this example). The CBO is simply not picking up the fact that most of the possible values between the “number” ranges aren’t valid dates and can’t possibly exist.

Of course, having dates stored as simple numbers means Oracle has no way of ensuring data integrity and can allow “invalid” dates to be inserted:

SQL> insert into ziggy values (1000001, 42, 20160599);
            
1 row created.

SQL> rollback;

Rollback complete.

As with dates stored as strings, we can again address these issues by either collecting histograms for such columns and/or by creating a function-based date index on the column:

SQL> create index ziggy_date_fn_i on ziggy(to_date(ziggy_date,'YYYYMMDD'));

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='ZIGGY';

COLUMN_NAME  NUM_DISTINCT    DENSITY HISTOGRAM       HID VIR
------------ ------------ ---------- --------------- --- ---
SYS_NC00004$        10000      .0001 NONE            YES YES
ZIGGY_DATE          10000      .0001 HYBRID          NO  NO
CODE                 1000       .001 NONE            NO  NO
ID                1000000    .000001 NONE            NO  NO

The associated query with the equality predicate has accurate estimates as it did previously:

SQL> select * from ziggy where to_date(ziggy_date, 'YYYYMMDD') = '13-JUN-2015';

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 945728471

-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                 |  100 |   2300 |     103 (0)| 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY           |  100 |   2300 |     103 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_DATE_FN_I |  100 |        |       3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')=TO_DATE(' 2015-06-13 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
111 consistent gets
0 physical reads
0 redo size
2877 bytes sent via SQL*Net to client
618 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed

As the virtual column created for the function-based index also has 10,000 distinct values and a corresponding density of 0.0001, the CBO is getting the cardinality estimate of 100 rows spot on.

But importantly, both associated range based queries are now also being accurately costed by the CBO as it now knows the data being searched is date based and hence can more accurately determine the actual expected dates to be returned within the specified “date” ranges.

SQL> select * from ziggy where to_date(ziggy_date, 'YYYYMMDD') between '10-OCT-2015' and '11-NOV-2015';

3300 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2421001569

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       | 3400 | 78200 |   1061 (35) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | ZIGGY | 3400 | 78200 |   1061 (35) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')>=TO_DATE('
2015-10-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')<=TO_DATE(' 2015-11-11
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
2991 consistent gets
0 physical reads
0 redo size
95829 bytes sent via SQL*Net to client
2961 bytes received via SQL*Net from client
221 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3300 rows processed

The CBO is now estimating not 573 rows, but 3,400 rows which is much closer to the actual 3,300 rows being returned. As a result, the CBO is now performing a more efficient Full Table Scan (due to the poor Clustering Factor of the index) than the Index Range Scan performed previously.

If we look at the other range scan query:

SQL> select * from ziggy where to_date(ziggy_date, 'YYYYMMDD') between '25-DEC-2015' and '01-JAN-2016';

800 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 945728471

-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes | Cost (%CPU) | Time     |
-------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                 |  900 | 20700 |      909 (1)| 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY           |  900 | 20700 |      909 (1)| 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_DATE_FN_I |  900 |       |        5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')>=TO_DATE(' 2015-12-25 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')<=TO_DATE(' 2016-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
861 consistent gets
7 physical reads
0 redo size
18917 bytes sent via SQL*Net to client
1135 bytes received via SQL*Net from client
55 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
800 rows processed

The CBO is now estimating not 33023 rows, but 900 rows which is again much closer to the actual 800 rows being returned. As a result, the CBO is now performing a more efficient Index Range Scan than the Full Table Scan is was previously.

And of course, the database via the function-based date index now has a manner in which protect the integrity of the date data:

SQL> insert into ziggy values (1000001, 42, 20160599);
insert into ziggy values (1000001, 42, 20160599)
*
ERROR at line 1:
ORA-01847: day of month must be between 1 and last day of month

However, the best way in Oracle to store “Date” data is within a Date data type column …

Storing Date Values As Characters Part II (A Better Future) May 30, 2016

Posted by Richard Foote in 12c, CBO, Function Based Indexes, Oracle Indexes, Storing Dates as Characters.
1 comment so far

In the previous post, I discussed how storing date values within a character data type is a really really bad idea and illustrated how the CBO can easily get its costings totally wrong as a result. A function-based date index helped the CBO get the correct costings and protect the integrity of the date data.

During the demo, I re-collected statistics on the table as the associated hidden virtual column after creating the function-based index doesn’t have statistics.

Before re-collecting statistics:

SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='BOWIE';</pre>
 
COLUMN_NAME  NUM_DISTINCT DENSITY     HISTOGRAM      HID VIR
------------ ------------ ---------- --------------- --- ---
SYS_NC00004$                         NONE            YES YES
BOWIE_DATE          10000      .0001 NONE            NO  NO
CODE                 1000       .001 NONE            NO  NO
ID                1000000    .000001 NONE            NO  NO

And afterwards:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed. 

SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column 
from dba_tab_cols where table_name='BOWIE'; 

COLUMN_NAME  NUM_DISTINCT DENSITY    HISTOGRAM       HID VIR 
------------ ------------ ---------- --------------- --- --- 
SYS_NC00004$        10000      .0001 NONE            YES YES 
BOWIE_DATE          10000      .0001 HYBRID          NO  NO 
CODE                 1000       .001 NONE            NO  NO 
ID                1000000    .000001 NONE            NO  NO

 

We can see that the hidden virtual column now has statistics.

But we also notice another difference, that being the BOWIE_DATE column now has a histogram (of type Hybrid).

As discussed in the previous post, the issue here is that the date data within the character column covers only a very specific subset of all the potential character values that could reside within the column. Therefore the CBO is getting the range scan selectivity hopelessly incorrect.

Now that we’ve run a few queries featuring the BOWIE_DATE column in the predicates and as there’s effectively data skew within the column, the column becomes a candidate for a histogram with the default SIZE AUTO collection method.

The histogram now provides the CBO with a much more accurate picture of the distribution of the data within the BOWIE_DATE and that between discrete “date” column values, there are only so many rows that qualify.

As a result of the histogram, the CBO can now make much more accurate cardinality estimates.

If we now re-run the query that actually returns 8300 rows but the CBO previously estimated only 100 rows be returned:

SQL> select * from bowie where bowie_date between '2015 10 10' and '2015 12 31'
                                             
8300 rows selected.
                                                            
Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       | 4152 | 83040 |   1000 (12) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE | 4152 | 83040 |   1000 (12) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("BOWIE_DATE">='2015 10 10' AND "BOWIE_DATE"<='2015 12 31')
Statistics
----------------------------------------------------------
14 recursive calls
0 db block gets
4063 consistent gets
0 physical reads
0 redo size
282075 bytes sent via SQL*Net to client
6635 bytes received via SQL*Net from client
555 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
8300 rows processed

We see that at an estimated 4,152 rows, it’s a much better estimate. Not perfect, but maybe good enough to now get the more efficient Full Table Scan execution plan.

If we re-run the query that returned over 1/2 the table at some 570,000 rows but with the CBO previously estimating only 116 rows:

SQL> select * from bowie where bowie_date between '2000 10 10' and '2016 12 31';</pre>
  
570800 rows selected.
    
Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       | 572K |   10M |   1012 (13) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE | 572K |   10M |   1012 (13) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("BOWIE_DATE">='2000 10 10' AND "BOWIE_DATE"<='2016 12 31')
   
Statistics
----------------------------------------------------------
14 recursive calls
0 db block gets
41456 consistent gets
4 physical reads
0 redo size
19292352 bytes sent via SQL*Net to client
419135 bytes received via SQL*Net from client
38055 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
570800 rows processed

 

We see that at an estimate of 572K rows, it’s now got this just about right and again has made the right decision with the Full Table Scan execution plan.

Storing date data in character based columns is still a really really bad idea and limits the manner in which date data can be analysed, protected and accessed, but with appropriate histograms in place, at least the CBO has some chance of making a reasonable fist of things with some range based queries.

As a follow-up, I was asked if storing dates in NUMBER format is a better option than as a string. I’ll discuss that next.

Storing Date Values As Characters (What’s Really Happening) May 26, 2016

Posted by Richard Foote in 12c, Function Based Indexes, Oracle Indexes, Storing Dates as Characters.
1 comment so far

For something that’s generally considered an extremely bad idea, I’ve lost count of the number of times I’ve come across applications that insist on storing date values as characters within the database. We’ve all seen them …

I recently got called in to assist a customer who was having issues with a POC in relation to the database not appearing to want to use the In-Memory Database option as expected. In various key scenarios, the CBO kept coming up with execution plans that used index plans (they were hesitant to drop these particular indexes), when if it only just used the Database In-Memory store, the queries ran so much faster. So I was called in to find out what’s really happening and it turned out that the main culprit was indeed queries against columns where dates were stored as characters within the database. In the process, we found another issue with some “invalid” date values. Go figure.

Interestingly, both issues could be addressed by creating a new index on the date column …

I’ve kinda replicated the scenario here with the following little test case. I’ll begin by creating a table with a varchar2 field (bowie_date) that actually stores a whole bunch of “dates”:

SQL> create table bowie (id number, code number, bowie_date varchar2(42));
Table created

SQL> insert into bowie select rownum, mod(rownum,1000), to_char(sysdate-mod(rownum,10000), 'YYYY MM DD') from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

So the bowie_date column basically has 10000 different dates, with each date evenly distributed with 100 occurrences for each date.

I’ll now collect statistics on the table:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');
PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='BOWIE';

COLUMN_NAME  NUM_DISTINCT DENSITY          HISTOGRAM HID VIR
------------ ------------ ---------- --------------- --- ---
BOWIE_DATE          10000      .0001 NONE            NO  NO
CODE                 1000       .001 NONE            NO  NO
ID                1000000    .000001 NONE            NO  NO

 

So the bowie_date column indeed has 10000 distinct dates.

I’ll now create a standard b-tree index on this column:

SQL> create index bowie_date_i on bowie(bowie_date);
Index created.

So the data looks as follows with the bowie_date a varchar2 column that actually contains date data, with the following min/max ranges:

SQL> select * from bowie where rownum <11;

        ID       CODE BOWIE_DATE
---------- ---------- ----------
       916        916 2013 11 22
       917        917 2013 11 21
       918        918 2013 11 20
       919        919 2013 11 19
       920        920 2013 11 18
       921        921 2013 11 17
       922        922 2013 11 16
       923        923 2013 11 15
       924        924 2013 11 14
       925        925 2013 11 13

10 rows selected.

SQL> select min(bowie_date) min, max(bowie_date) max from bowie;

MIN        MAX
---------- ----------
1989 01 09 2016 05 26

If we run a query that uses an equality predicate as follows:

SQL> select * from bowie where bowie_date = '2015 06 13';
100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1525056162

----------------------------------------------------------------------------------------------------
| Id | Operation                             | Name         | Rows | Bytes | Cost (%CPU)|     Time |
----------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                      |              |  100 |  2000 |     103 (0)| 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED   | BOWIE        |  100 | 2000  |     103 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN                      | BOWIE_DATE_I |  100 |       |       3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("BOWIE_DATE"='2015 06 13')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
110 consistent gets
16 physical reads
0 redo size
4383 bytes sent via SQL*Net to client
618 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed 

The CBO gets things just about right. 100 rows are estimated and indeed 100 rows are retrieved. So we have confidence the CBO has made the right decision in using the index here as so few rows are actually retrieved.

However, if we run a range scan predicate such as the following:

SQL> select * from bowie where bowie_date between '2015 10 10' and '2015 12 31';

8300 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1525056162

----------------------------------------------------------------------------------------------------
| Id | Operation                          | Name         | Rows | Bytes | Cost (%CPU)|     Time |
----------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                   |              |  100 | 2000  |     104 (0)| 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE        |  100 | 2000  |     104 (0)| 00:00:01 |
|*2  | INDEX RANGE SCAN                   | BOWIE_DATE_I |  100 |       |       3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("BOWIE_DATE">='2015 10 10' AND "BOWIE_DATE"<='2015 12 31')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
8881 consistent gets
38 physical reads
0 redo size
315219 bytes sent via SQL*Net to client
6635 bytes received via SQL*Net from client
555 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
8300 rows processed

The CBO has got the costings wrong here. It still estimates  only 100 rows are to be returned when in actual fact 8300 rows come back.

If we select an even larger “date” range:

SQL> select * from bowie where bowie_date between '2000 10 10' and '2016 12 31';

570800 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1525056162

----------------------------------------------------------------------------------------------------
| Id | Operation                          | Name         | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |              |  116 |  2320 |     120 (0)| 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE        |  116 |  2320 |     120 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN                   | BOWIE_DATE_I |  116 |       |       3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("BOWIE_DATE">='2000 10 10' AND "BOWIE_DATE"<='2016 12 31')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
610491 consistent gets
1704 physical reads
0 redo size
21575496 bytes sent via SQL*Net to client
419135 bytes received via SQL*Net from client
38055 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
570800 rows processed

The CBO has got things seriously wrong here. We’re actually returning over 1/2 the table, some 570,800 rows but the CBO thinks only 116 rows will be returned. Why ?

The problem comes back to storing date values as characters. The CBO has absolutely no idea that these “characters” are actually meant to be dates and has no idea that the only valid ranges of values are date values.

With a possible range between “1989 01 09” and “2016 05 26“, any character range/combination of values (up to the 42 column size) could potentially exist in this column ( value ‘1zxgs$.jKN6tglhasgdlhlhd23bkbk?k’ for example).

So the required range between ‘2000 10 10’ and ‘2016 12 31’ actually represents a relatively narrow range of possible values within the range of all possible values (especially as the leading column differs between the min/max).

Hence why the CBO is estimating such a low number of rows to be returned and hence why the CBO is deciding to incorrectly use the index. “Hiding” the meaning and distribution of values from the CBO in this manner can be problematic to say the least.

Worse of course is also the possibility of “invalid” dates being entered as the database has no implicit way to police the integrity of the data:

SQL> insert into bowie values (1000001, 42, '2016 6 31');

1 row created.

SQL> rollback;

Rollback complete.

We know there’s no such date as 31st June but Oracle has no idea that this is logically invalid data. Or the value ‘lh;ghsgdsd7gdGLH96bb’ for that matter …

Did I mention that storing dates in a character column is a really really bad idea …

Now there are a couple of ways to help address these issues if changing the column and its datatype is not possible due to the application. One way is to create a function-based index as follows on a date version of the column:

SQL> create index bowie_date_fn_i on bowie(to_date(bowie_date,'YYYY MM DD'));

Index created.

If we run the equality query but this time referencing the to_date function:

SQL> select * from bowie where to_date(bowie_date, 'YYYY MM DD') = '13-JUN-2015';

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 960797537

-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows  | Bytes  | Cost (%CPU)|     Time |
-------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                 | 10000 |   273K |     893 (2)| 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE           | 10000 |   273K |     893 (2)| 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_DATE_FN_I |  4000 |        |       3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(TO_DATE("BOWIE_DATE",'YYYY MM DD')=TO_DATE(' 2015-06-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
12 recursive calls
0 db block gets
113 consistent gets
16 physical reads
0 redo size
3268 bytes sent via SQL*Net to client
618 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed

We notice the CBO has got the row estimate way wrong here, thinking that 10000 rows, not 100 rows are to be returned. Why ?

Remember, when we create a function-based index, Oracle creates a hidden virtual column on the table as Oracle needs a way to store the statistics associated with the result set from the function. But these statistics aren’t populated until we next collect statistics on the table (or explicitly for just the hidden columns) and without the column statistics, the CBO can make poor assumptions:

SQL> select column_name, data_type, num_distinct, density, histogram, hidden_col
umn, virtual_column from dba_tab_cols where table_name='BOWIE';

COLUMN_NAME  DATA_TYPE NUM_DISTINCT    DENSITY HISTOGRAM       HID VIR
------------ --------- ------------ ---------- --------------- --- ---
SYS_NC00004$ DATE                              NONE            YES YES
BOWIE_DATE   VARCHAR2         10000      .0001 NONE            NO  NO
CODE         NUMBER            1000       .001 NONE            NO  NO
ID           NUMBER         1000000    .000001 NONE            NO  NO

There are no column statistics for the virtual column but we note the data type of the virtual column is DATE. So let’s collect new statistics on the table:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.
         .
SQL> select column_name, data_type, num_distinct, density, histogram, hidden_col
umn, virtual_column from dba_tab_cols where table_name='BOWIE';

COLUMN_NAME  DATA_TYPE NUM_DISTINCT    DENSITY HISTOGRAM       HID VIR
------------ --------- ------------ ---------- --------------- --- ---
SYS_NC00004$ DATE             10000      .0001 NONE            YES YES
BOWIE_DATE   VARCHAR2         10000      .0001 HYBRID          NO  NO
CODE         NUMBER            1000       .001 NONE            NO  NO
ID           NUMBER         1000000    .000001 NONE            NO  NO

There are actually two key differences in the above statistics, but in this post I’ll just focus on the fact that the hidden virtual column now has associated statistics (I’ll discuss the other key difference in my next blog post).

When we re-run the query:

SQL> select * from bowie where to_date(bowie_date, 'YYYY MM DD') = '13-JUN-2015';

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 960797537

-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes | Cost (%CPU) | Time     |
-------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                 |  100 |  2800 |     102 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE           |  100 |  2800 |     102 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_DATE_FN_I |  100 |       |       3 (0) | 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(TO_DATE("BOWIE_DATE",'YYYY MM DD')=TO_DATE(' 2015-06-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
108 consistent gets
0 physical reads
0 redo size
3268 bytes sent via SQL*Net to client
618 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed

The CBO has now got the estimate spot on.

If we now run the previous range scan query that accessed 1/2 the table, referencing the to_date function:

SQL> select * from bowie where to_date(bowie_date, 'YYYY MM DD') between '10-OCT-2000' and '31-DEC-2015';

556100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows  | Bytes  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       |  556K |    14M |   1328 (34)| 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE |  556K |    14M |   1328 (34)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(TO_DATE("BOWIE_DATE",'YYYY MM DD')>=TO_DATE(' 2000-10-10
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND TO_DATE("BOWIE_DATE",'YYYY MM
DD')<=TO_DATE(' 2015-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
40444 consistent gets
0 physical reads
0 redo size
18804277 bytes sent via SQL*Net to client
408355 bytes received via SQL*Net from client
37075 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
556100 rows processed

The CBO has got the estimates pretty well spot on and is now performing the far more efficient Full Table Scan. The CBO knows that the virtual column is of type DATE and therefore can much more accurately determine the actual cardinality estimates for the range scan on the “date” column.

If we now run the other corresponding range scan that returned a moderate number of rows:

SQL> select * from bowie where to_date(bowie_date, 'YYYY MM DD') between '10-OCT-2015' and '31-DEC-2015';

8300 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes  | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       | 8401 |   229K |   1205 (27) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE | 8401 |   229K |   1205 (27) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(TO_DATE("BOWIE_DATE",'YYYY MM DD')>=TO_DATE(' 2015-10-10
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND TO_DATE("BOWIE_DATE",'YYYY MM
DD')<=TO_DATE(' 2015-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
14 recursive calls
0 db block gets
4058 consistent gets
0 physical reads
0 redo size
282075 bytes sent via SQL*Net to client
6635 bytes received via SQL*Net from client
555 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
8300 rows processed

We notice that again the CBO has got the row estimate just about right and because the index has a poor clustering factor, the CBO still decided to go down the Full Table Scan path.

Even if we don’t use the index all that much (perhaps the Database In-Memory store is a better option for most queries) , it can still play an important role in policing the integrity of the data. An attempt to insert an invalid date will now automatically be captured by the database and fail:

SQL> insert into bowie values (1000001, 42, '2016 6 31');
insert into bowie values (1000001, 42, '2016 6 31')
*
ERROR at line 1:
ORA-01839: date not valid for month specified

As the function-based index can only be populated or updated if a valid date is inserted into the table, any attempt to insert an invalid date will fail thus protecting the integrity of the data.

The best “fix” here is to store these dates in a date field within the database, where the above issues are automatically addressed. If this is not possible, then the introduction and usage of an associated function-based index can certainly assist the CBO in making the correct decision.

However, if data integrity is not a concern (the application does a fantastic job of it and no one ever has to perform manual data fixes directly in the database), then there’s another option to help make the CBO do the right thing.

The clue is back in how I collected the table statistics, which I’ll discuss in my next post.

New Oracle Cloud Offering – Indexing as a Service (IDXaaS) (I Pity The Fool) April 1, 2016

Posted by Richard Foote in 12c Rel 2, IDXaaS, Oracle Cloud, Oracle Indexes.
12 comments

This of course is an April Fools joke. Sorry !!

A very exciting announcement !! I’ve recently been promoted within Oracle Corporation to lead their brand new Oracle Cloud offering “Indexing as a Service” (IDXaaS) team, based on my previous work and experience in the indexing space. Yes, I’m both thrilled and excited to be involved in this new venture:)

Promising both improved levels of service and reductions in maintenance and administration overheads, this new Oracle Cloud offering will take all indexing related hassles from on-premise databases and address them in the Cloud. No more index rebuilds, no more concerns you don’t have the right indexes in place and importantly no more worrying that indexes are taking up valuable TBs of storage. Oracle indexes can now be safely and seamlessly migrated and stored in the Oracle Cloud.

Relational databases depend heavily on indexes to ensure data can be retrieved quickly and efficiently, but indexes can be very difficult to administer, can use considerable resources to maintain and consume considerable amounts of disk space. They can be so problematic that some folk actually devote an entire blog on the subject !! Imagine if all these issues and overheads can be taken away from you when administrating Oracle databases…

Index DDL will be enhanced in the 12c Release 2 Oracle Database to optionally specify a Cloud Domian Identifier (CDI) instead of a tablespace. So you will be able to either build or rebuild an index into the Oracle IDXaaS Cloud with syntax such as follows: CREATE INDEX bowie_idx on bowie(id) CLOUD LOCATION bowies_cloud_domain; This index will fully index the bowie table that exists in an on premise database (or indeed a database that is also in the cloud) but the index itself will be created and maintained in the Oracle Cloud. The database is fully aware of the actual physical location of the index as it is when created in a traditional tablespace and will simply update the index structure as DML is applied to the base table as it has always done. All Oracle indexing types and options will be fully supported including Oracle Text and Spatial indexes.

Like indexes stored in a tablespace, these indexes will need to be rebuilt or coalesced every now and then, but you won’t need to worry about this as Oracle will now implicitly do this for you in the Oracle Cloud. I’m currently hiring a team of DBAs with expertise in the “art” of rebuilding indexes who will ensure all your indexes will be rebuilt automatically for customers as necessary. By default, these operations will usually be performed on Sundays within the Oracle Cloud, but as indexes can be rebuilt online and as these indexes are no longer physically stored within your database, it’s an operation that is now performed automatically and seamlessly for you. Customers no longer need concern themselves with these types of index operations or when they occur, so while “we” work over the weekend, your DBAs can instead relax and keep their gardens nice and weed free. Please leave a comment on the blog if you think you have what it takes to effectively rebuild Oracle indexes and interested in joining my elite IDXaaS team !!

For customers that choose to use the “Premium IDXaaS” offering, Oracle will automatically create or drop indexes as required in the Oracle Cloud. Customers no longer have to make the decision on what indexes should be created on which columns in which order; this will be performed automatically for you by Oracle’s IDXaaS. By running new advanced analysis routines on your database workloads, the Premium IDXaaS offering will create an index if it will help the efficiency of any of your current workloads. Conversely, if Oracle believes an index is redundant (for example if your database is running on Exadata), indexes will first be automatically made “invisible” and if there are no serious repercussions, will be automatically dropped 7 days later. DBAs and Developers will no longer need to worry about which indexes to create/drop, significantly reducing the complexities in managing both current and future Oracle database environments.

Oracle will also introduce a new package that will easily migrate all your existing indexes for you into the Oracle Cloud. The DBMS_INDEX_CLOUD package will automatically migrate all indexes for a specified Table/Partition/Schema/Pluggable Database/Database into the Oracle Cloud, which will be a fully online, seamless operation. Once completed, associated indexes within “on premise” tablespaces will effectively be dropped and be replaced with their Cloud equivalents. All that “expensive” storage that was previously tied up in storing all those Oracle indexes can now be freed up to store much more business rich data such as emails, selfie photos, David Bowie music catalogs, etc. Note that these significant storage savings will also translate in smaller backups, smaller Standby databases, smaller Production clones, etc. Importantly, the Oracle Cost Based Optimizer will be fully aware of the Cloud Indexes (as their definitions, statistics and other metadata are still stored within the database data dictionary) and will use the Cloud Indexes as appropriate. No changes to existing SQL is necessary, the CBO will decide to use an index in the cloud in the same manner as it chooses to use an index currently. There is also a new Cloud Index execution path option that allows data to be retrieved via the Cloud Index in a “good enough” manner which is much more efficient than a normal index scan, although it might not necessarily retrieve all the possible data that meets an SQL predicate. It effectively provides what we refer to as that “internet” data retrieval experience.

Oracle Open World 2016 will be the time of the official release for this exciting new Oracle Cloud offering. I will be presenting a number of sessions at OOW16 on this new IDXaaS and there will be a number of live demos to demonstrate its unique capabilities and the simplicity of the migration process. However, there will be a number of announcements before then, including the opportunity to be involved in a beta customer program. There will also be a number of hands-on workshops being conducted globally, with customers getting the chance to see for themselves how easily it is to move database indexes into the Oracle Cloud.

Lastly and perhaps most exciting of all, this new IDXaaS capability will initially be a free option with all current Oracle databases from 12c Release 2 onwards. You will be able to migrate all your existing 12.2 Oracle database indexes onto the Oracle Cloud and they will be stored, maintained and administrated for free as part of your current Oracle database licenses. All Oracle database editions will be supported. This free offer however is likely to end once the 12c Release 2 database is officially released but the additional pay as you go licensing constructs are proposed to be extremely competitive with yearly “indexed” priced increments. Also to be subsequently released will be Oracle IDXaaS support for most other commercial databases including DB2, SQL Server, NoSQL, MySQL, Sybase, Hana, Hadoop and Notepad.

So administrating Oracle (and other) Databases will soon change forever.  Oracle indexes will no longer have to be a concern for customers as all database indexes can instead be stored in the Oracle Cloud via the new IDXaaS offering, allowing Oracle to automatically store and manage these tricky database objects for you. Not only will index rebuilds be a thing of the past, but so will all concerns regarding index storage, creation and maintenance. I’m very excited to be involved in this new undertaking and indeed with my move to Oracle HQ in San Francisco in the coming months (I’ll up in the Clouds level of Building 1). I look forward to talking IDXaaS with you in the days to come:)

 

This of course is an April Fools joke. Sorry !!

New Round of Let’s Talk Database Events February 12, 2016

Posted by Richard Foote in Oracle Indexes.
3 comments

I’ll be presenting a new round of “Let’s Talk Database” events around Australia and NZ next month. These are free events but have often “sold out” in the past so booking early is recommended to avoid disappointment.

All events run between 9:00am – 12:30pm and are followed by a networking lunch.

Currently, the confirmed events are:

  • Sydney – Tuesday, 22 March 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: mitch.foster@oracle.com
  • Melbourne – Wednesday, 23 March 2016: Let’s Talk Oracle Database: Oracle 12c Database New Features. To register, email: johanne.sergeant@oracle.com SOLD OUT !!
  • Canberra – Thursday, 31 March 2016: Let’s Talk Database: Intro To CBO Costings, APEX New Features and Software-on-Silicon. To register, email: richard.foote@oracle.com
  • Perth – Thursday, 5 May 2016: Let’s Talk Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: dennis.ward@oracle.com
  • Adelaide – Friday, 6 May 2016: Let’s Talk Database: Intro To CBO Costings, Database Futures. To register, email: brodie.james@oracle.com
  • Wellington Wednesday, 11 May 2016: Let’s Talk Oracle Database: Oracle 12c Database New Features. To register, email: tanya.fea@oracle.com
  • Brisbane: Wednesday, 18 May 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: gordon.laing@oracle.com

 

More “Let’s Talk Database” events will be added in the coming days.

Details of sessions:

Oracle 12c Database New Features

An in-depth technical look at many of the exciting enhancements and new capabilities introduced in the Oracle 12c Database that help make life easier for the DBA and Oracle Developer. Among a number of topics, we’ll have a look at changes with respect the Cost Based Optimizer, new statistics gathering options, discuss many of the new indexing related capabilities such as Partial Indexes, Advanced Index Compression, Asynchronous Global Index Maintenance, etc. and look at many of those cool little things such as In-Database Archiving, Automatic Data Optimization, Identity Columns, Row-Limiting Clauses, Temporary Undo, Online Move operations, Clustering Attributes, Temporal and Improved Flashback Queries etc. to name but a few that can sometimes make all the difference. Might even squeeze in a bit of Database In-Memory. So plenty to talk about !!!

Performance Diagnostics – How To Correctly Use AWR Reports

I often get called in by customers to determine and address the root cause of database performance issues. Depending on the issue, a request for a simple Automatic Workload Repository (AWR) report is often sufficient to accurately diagnose the root problem(s). However, many DBAs find AWR reports daunting as they contain so much database diagnostic information, they don’t even know where to begin to find the truly useful information.

In this session, I begin by discussing the basic concepts of response times, DB times, the Oracle Wait Interface and how to focus on what is truly important. I then look at how to best read an AWR report to quickly go to the most relevant sections that detail any specific issues. I also discuss a general tuning and diagnostic methodology that ensures one can quickly determine whether an AWR report will indeed be sufficient and how to accurately and consistently use the AWR report to pinpoint and determine root causes for global database performance issues. We’ll go through a number of actual “real-life” examples that highlight various performance issues and how one accurately determines the actual root issues through an appropriate AWR report.

Time permitting, we’ll go through some other useful diagnostic tools such as the AWR Warehouse, Automatic Database Diagnostic Monitor (ADDM), Active Session History (ASH)  and SQL Extended Tracing and scenarios when they can prove to be invaluable in addressing database performance issues.

Intro To CBO Costings, APEX New Features and Software-on-Silicon

A common issue is confusion on why the Oracle Cost Based Optimizer (CBO) has made the decision the use a full table scan or use that index and not the one that appears more appropriate. The costing figures that the CBO derives and displays in execution plans are often viewed as cryptic, which is unfortunate as these costings are key to CBO decisions. Often the attempt to resolve such issues is to set inappropriate database parameters that introduces as many issues as it tries to resolve. This session looks under the covers of the CBO to see how it precisely derives its various costings and hence how to correctly troubleshoot and resolve inefficient queries. Bring a calculator, it might come in handy J

Oracle Application Express (APEX) is a rapid development environment built into the Oracle database. APEX 5 is the latest and largest release of Application Express. This APEX session will include an overview of APEX concentrating on new features, followed by a live demonstration. The session will also look at how easy it is to deploy APEX applications in the cloud.

Oracle’s Software-in-Silicon features extend the concepts of co-engineering hardware and software inherent in engineered systems such as the Exadata. Embedding software which has traditionally sat in the database layer into the hardware layer can yield both performance and security beyond that available of a traditional CPU. This presentation covers how Oracle Database 12c can leverage Software-in-Silicon features, and how third party developers can make use of the new features within their own applications.

 

Hope to see you at one of these events :)

Indexes and Initrans (Blackstar) January 14, 2016

Posted by Richard Foote in Oracle Indexes.
7 comments

It’s been a very tough week so to help keep my mind off rather sad events, thought I’ll finish off one of my unpublished articles.

Initrans is a physical attribute that determines the initial number of concurrent transaction entries allocated within each data block for a given table/index/cluster. Every transaction that updates a block has to acquire an Interested Transaction List (ITL) slot, in order to store the transaction id, rollback information and ultimately lock the necessary row/index entry within the block.

For a table, the inability to acquire a necessary ITL can lead to hanging transactions and the resultant poor performance. A simple demonstration.

I’ll first create and populate a table with PCTFREE 0 so that the table blocks are full with no free space:

SQL> create table ziggy (id number, code number, name varchar2(42)) pctfree 0;

Table created.

SQL> insert into ziggy select rownum, mod(rownum,100), ‘Ziggy Stardust’
from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

If we look at a partial block dump of an example block from the table (which happens to hold the first rows inserted):

Block header dump: 0x01835b37
Object id on Block? Y
seg/obj: 0x190b7 csc: 0x00.2affca5 itc: 2 flg: E typ: 1 – DATA
brn: 0 bdba: 0x1835b30 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0009.010.0000d52e 0x014001a4.0576.0e –U- 302 fsc 0x0000.02affcf5
0x02 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
bdba: 0x01835b37
data_block_dump,data header at 0x1fc8264
===============
tsiz: 0x1f98
hsiz: 0x26e
pbl: 0x01fc8264
76543210
flag=——–
ntab=1
nrow=302
frre=-1
fsbo=0x26e
fseo=0x283
avsp=0x15
tosp=0x15
0xe:pti[0] nrow=302 offs=0
0x12:pri[0] offs=0x718
0x14:pri[1] offs=0x730
0x16:pri[2] offs=0x748
0x18:pri[3] offs=0x760
0x1a:pri[4] offs=0x778

….

0x26a:pri[300] offs=0x6e6
0x26c:pri[301] offs=0x6ff
block_row_dump:
tab 0, row 0, @0x718
tl: 24 fb: –H-FL– lb: 0x1 cc: 3
col 0: [ 2] c1 02
col 1: [ 2] c1 02
col 2: [14] 5a 69 67 67 79 20 53 74 61 72 64 75 73 74
tab 0, row 1, @0x730
tl: 24 fb: –H-FL– lb: 0x1 cc: 3
col 0: [ 2] c1 03
col 1: [ 2] c1 03
col 2: [14] 5a 69 67 67 79 20 53 74 61 72 64 75 73 74
tab 0, row 2, @0x748
tl: 24 fb: –H-FL– lb: 0x1 cc: 3
col 0: [ 2] c1 04
col 1: [ 2] c1 04
col 2: [14] 5a 69 67 67 79 20 53 74 61 72 64 75 73 74
tab 0, row 3, @0x760
tl: 24 fb: –H-FL– lb: 0x1 cc: 3
col 0: [ 2] c1 05
col 1: [ 2] c1 05
col 2: [14] 5a 69 67 67 79 20 53 74 61 72 64 75 73 74
tab 0, row 4, @0x778
tl: 24 fb: –H-FL– lb: 0x1 cc: 3
col 0: [ 2] c1 06
col 1: [ 2] c1 06
col 2: [14] 5a 69 67 67 79 20 53 74 61 72 64 75 73 74

….

The key aspect to note here is that the table is allocated two ITL entries by default. Therefore, two concurrent transactions can safely update this block but as there is no free space sufficient for an additional ITL to be dynamically added, what happens if we have three concurrent transactions ?

In session one:

SQL> delete ziggy where id=1;

1 row deleted.

In session two:

SQL> delete ziggy where id=2;

1 row deleted.

But in session three:

SQL> delete ziggy where id=3;

 

It hangs and will do so until one of the other transactions complete.

So Initrans can preallocate more ITL entries (up to 255) to prevent these types of scenarios.

But what of indexes, is the story the same ? Well, no as we’ll see.

I’ll create another table, but this time give the table its default 10% of freespace but create the index with a PCTFREE 0:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,100), ‘David Bowie’
from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_id_i on bowie(id) pctfree 0;

Index created.

If we look at a partial block dump of the first leaf block in the index:

Block header dump: 0x018ad71c
Object id on Block? Y
seg/obj: 0x190b6 csc: 0x00.2aff9d5 itc: 2 flg: E typ: 2 – INDEX
brn: 0 bdba: 0x18ad718 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x02 0xffff.000.00000000 0x00000000.0000.00 C— 0 scn 0x0000.02aff9d5
Leaf block dump
===============
header address 33325668=0x1fc8264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 540
kdxcofbo 1116=0x45c
kdxcofeo 1120=0x460
kdxcoavs 4
kdxlespl 0
kdxlende 0
kdxlenxt 25876253=0x18ad71d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 80 01 57 00 00
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 03
col 1; len 6; (6): 01 80 01 57 00 01
row#2[4523] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 04
col 1; len 6; (6): 01 80 01 57 00 02
row#3[4535] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 05
col 1; len 6; (6): 01 80 01 57 00 03
row#4[4547] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 06
col 1; len 6; (6): 01 80 01 57 00 04

We notice that it also has two ITL entries by default.

However, the first ITL entry is reserved exclusively for Oracle recursive operations on the index (such as changes as a result of an index block split). An index leaf block can therefore only handle one concurrent transaction by default.

Notice also that the leaf block currently contains 540 index entries.

So what happens if we have 2 concurrent transactions wanting to update the same leaf block ?

In session one:

SQL> delete bowie where id=1;

1 row deleted.

In session two:

SQL> delete bowie where id=2;

1 row deleted.

Both complete successfully !! So what has occurred here ? Let’s look at a fresh partial block dump of this leaf block:

Block header dump: 0x018ad71c
Object id on Block? Y
seg/obj: 0x190b6 csc: 0x00.2affa76 itc: 3 flg: E typ: 2 – INDEX
brn: 0 bdba: 0x18ad718 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0007.00b.0000ce8a 0x0140da06.0568.01 CB– 0 scn 0x0000.02affa76
0x02 0x0001.013.0000cd57 0x01400083.0559.02 —- 1 fsc 0x000e.00000000
0x03 0x0005.01a.0000d062 0x014009eb.0532.13 —- 1 fsc 0x000e.00000000
Leaf block dump
===============
header address 33325692=0x1fc827c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 278
kdxcofbo 592=0x250
kdxcofeo 4499=0x1193
kdxcoavs 3907
kdxlespl 0
kdxlende 2
kdxlenxt 25675901=0x187c87d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8012
row#0[4499] flag: —DS–, lock: 2, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 80 01 57 00 00
row#1[4511] flag: —D—, lock: 3, len=12
col 0; len 2; (2): c1 03
col 1; len 6; (6): 01 80 01 57 00 01
row#2[4523] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 04
col 1; len 6; (6): 01 80 01 57 00 02
row#3[4535] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 05
col 1; len 6; (6): 01 80 01 57 00 03
row#4[4547] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 06
col 1; len 6; (6): 01 80 01 57 00 04

So we have a few interesting things happening here.

Firstly, Oracle has allocated an additional ITL entry, we now have three in this leaf block. And indeed, both deleted rows have the deleted byte (D) set. But how is this possible when the leaf block was previously full ?

When an index is full but requires additional data to be inserted (as in the case of a new index entry), it has a mechanism to cater for this. It’s called an index block split and this is exactly what has occurred here. The second transaction needed additional space within the leaf block for an new ITL entry, so Oracle performed a 50-50 index block split thus providing a half empty block in which to now insert the new ITL entry.

If we look at the number of index entries now in the leaf block (kdxconro), we notice it has gone down from 540 to just 278. If we look at the next leaf block point (kdxlenxt), we notice it now points to a different block (25675901=0x187c87d).

An index block split, which ordinarily occurs frequently enough within an index, is a much better option than having a hanging transaction suffering from ITL contention issues.

As we now have heaps of free space within the leaf block, if we want to now perform yet another concurrent transaction:

In session three:

SQL> delete bowie where id=3;

1 row deleted.

It has also completed successfully. A fresh partial block dump of the index leaf block will show that Oracle has just added yet another ITL entry:

Block header dump: 0x018ad71c
Object id on Block? Y
seg/obj: 0x190b6 csc: 0x00.2affb79 itc: 4 flg: E typ: 2 – INDEX
brn: 0 bdba: 0x18ad718 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0007.00b.0000ce8a 0x0140da06.0568.01 CB– 0 scn 0x0000.02affa76
0x02 0x0001.013.0000cd57 0x01400083.0559.02 —- 1 fsc 0x000e.00000000
0x03 0x0005.01a.0000d062 0x014009eb.0532.13 —- 1 fsc 0x000e.00000000
0x04 0x0006.015.0000cfbe 0x0140040f.055e.18 —- 1 fsc 0x000e.00000000
Leaf block dump
===============
header address 33325716=0x1fc8294
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 278
kdxcofbo 592=0x250
kdxcofeo 4475=0x117b
kdxcoavs 3883
kdxlespl 0
kdxlende 3
kdxlenxt 25675901=0x187c87d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 7988
row#0[4475] flag: —DS–, lock: 2, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 80 01 57 00 00
row#1[4487] flag: —D—, lock: 3, len=12
col 0; len 2; (2): c1 03
col 1; len 6; (6): 01 80 01 57 00 01
row#2[4499] flag: —D—, lock: 4, len=12
col 0; len 2; (2): c1 04
col 1; len 6; (6): 01 80 01 57 00 02
row#3[4511] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 05
col 1; len 6; (6): 01 80 01 57 00 03
row#4[4523] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 06
col 1; len 6; (6): 01 80 01 57 00 04

So indexes are far more forgiving than tables when it comes to concurrency issues in relation to transactions acquiring ITL entries.

If we want to perhaps avoid (the generally minimal) overheads of dynamically allocating new ITL entries within indexes or perhaps unnecessary wasted space in monotonically increasing indexes from such block split operations on compactly created indexes, we can use Initrans to preallocate more than the default two ITL entries for indexes:

SQL> drop index bowie_id_i;

Index dropped.

SQL> create index bowie_id_i on bowie(id) initrans 10;

Index created.

Block header dump: 0x018ad71d
Object id on Block? Y
seg/obj: 0x190b2 csc: 0x00.2aff443 itc: 10 flg: E typ: 2 – INDEX
brn: 0 bdba: 0x18ad718 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x02 0xffff.000.00000000 0x00000000.0000.00 C— 0 scn 0x0000.02aff443
0x03 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x04 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x05 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x06 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x07 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x08 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x09 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x0a 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
Leaf block dump
===============
header address 33325860=0x1fc8324
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 466
kdxcofbo 968=0x3c8
kdxcofeo 1791=0x6ff
kdxcoavs 823
kdxlespl 0
kdxlende 0
kdxlenxt 25876254=0x18ad71e
kdxleprv 25876252=0x18ad71c
kdxledsz 0
kdxlebksz 7844
row#0[7831] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 05 4b
col 1; len 6; (6): 01 80 01 53 00 a6
row#1[7818] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 05 4c
col 1; len 6; (6): 01 80 01 53 00 a7

But this is generally unnecessary for indexes and just wastes space within index leaf blocks.

Now back to my grieving😦

David Bowie 1947-2016. My Memories. January 12, 2016

Posted by Richard Foote in Oracle Indexes.
16 comments

bowie1

In mid-April 1979, as a nerdy 13 year old, I sat in my bedroom in Sale, North England listening to the radio when a song called “Boys Keep Swinging” came on by an singer called David Bowie who I never heard of before. I instantly loved it and taped it next time it came on the radio via my cheap radio-cassette recorder. A few weeks later I saw David Bowie for the first time on TV performing this song on the Kenny Everett Video Show and from that moment on I was hooked. His performance was not only funny, but the song rocked and Bowie had a swagger that I just thought was instantly cool. OK, I can never be as cool as Bowie but perhaps I can be cool just by being a fan of Bowie.

I had little money but saved my pennies and eventually bought the “Best of Bowie” K-tel album on cassette (which I still have) to check out his previous work and just couldn’t believe that I loved each and every track. Space Oddity, Life on Mars, Jean Genie, Sound and Vision, “Heroes”, etc. etc. How can one person have so many great songs ?? So began the thrilling journey during the following years of slowly buying his back catalogue, album by album. Hunky Dory, “Heroes”, Diamond Dogs, his first “David Bowie” album, each and every one of them brilliant and I can still remember that feeling of sitting in my bedroom or in the family room back in Sale, listening to each of them for the first time.

In 1980, Bowie released “Scary Monsters and Super Creeps”, his first album release while a Bowie fan. It was stunning and watching the “Ashes to Ashes” video for the first time was an amazing experience. It felt more like a movie than a music video. I was excited that the single made No 1 in the UK charts, a bit like my football team winning a match.

In 1981, my parents bought a record player and the first album I bought on record (previously they had all been on cassette) was “The Rise and Fall of Ziggy Stardust and the Spiders From Mars”. I remember mum being cranky for playing it so loud but I said I had to, look, it says so on the back cover: “To be played at maximum volume”.

That year , I got my first book on David Bowie for my birthday “David Bowie: An Illustrated Record” by Roy Carr and Charles Shaar Murray. I treasure it still, even though I now have over 70 books on Bowie. Yes, I love music and I love to read.

In 1982, we moved to Canberra Australia but my Bowie odyssey continued. The first album I bought soon after our move was Station to Station. What an album. That first Aussie Christmas, I got the “really” expensive “David Live” double-album and just played it to death. What a great time.

In 1983, Bowie toured Australia during the huge “Let’s Dance” world tour and my parents bought me a ticket and bus ride to see him live in Sydney. I couldn’t sleep the night before and after getting to the Sydney Show Grounds nice and early, managed to get myself to the second front row. I spent hours in eager excitement, reading through the program (they were like mini books in those days) and then the moment when I saw him live for the first time. I was THAT close to him and I know he looked at me several times during the show.

As the years passed, I finally bought all his albums and started making enough money that I didn’t have to sell some books or whatever in order to afford the latest magazine that featured Bowie on the cover. Having some money came particularly useful as his music got re-released on CD format !!

1987, Bowie toured Australia again, this time as part of the “Glass Spiders” world tour. I had my first full-time job but no leave. Tricky, especially as Bowie was playing 8 shows at the Sydney Entertainment Arena.  So I bought 2 tickets to each show and spent one of the most wonderful weeks of my life going to work, leaving at around 4pm, driving to Sydney with someone different at most shows and then driving home afterwards to be in bed by 2am to do it all again the next day. Yes, I’m a fully fledged David Bowie fan and yes I loved each and every show:)

While many people stopped following Bowie after his Tin Machine period (which I of course loved), I continue to follow his career keenly and just couldn’t wait to buy his latest album, be it “Black Tie White Noise”, “The Buddha of Suburbia” and one of my all-time favorites, “Outside”. The great albums continued on into the 2000’s when he joined forces again with the great Tony Visconti as record producer.

And so came February 2004 and Bowie’s last tour of Australia during his “Reality” world tour. This time I went to both of his concerts again at the Sydney Entertainment arena, now some 17 years since he last toured. He looked and sounded great and it was fantastic to have my wife with me this time to share the experience. I remember as he walked off after the final show I had a deep and sad feeling that I would never see him live again …

A few months later, he had a heart attack while on stage and the tour finished prematurely. Except for the odd single and public appearance he was rarely seen, before he finally disappeared from the scene in 2006. I continued to play and love his music but with rumours that he wasn’t well health wise, I really thought he had finally retired from making music. Or so it seemed.

While on holiday in Hawaii in January 2013, my son said “Hey Dad, Bowie’s releasing a new single”. I thought it was a joke but to my utter amazement and sheer joy, not only had he just released a new single “Where Are We Now?”, but also a new album “The Next Day”. Not only was Bowie back, but the album was just as good as anything he’s done. What a come back.

Last week on Friday (was it really just a few days ago !!), Bowie released his new album, Blackstar and again, I couldn’t contain my excitement. The Blackstar single and video were both stunning as was the follow-up single Lazarus. But I was a touch worried. He looked frail and thinner than he’d been for a while. But hey, this is Bowie, the Thin White Duke, he often looks thin and frail.

On Sunday night I had a dream about Bowie (not surprisingly really as his new songs were going through me mind), but I dreamt he had died (seriously). I was sooooo relived it was just a silly dream. But on Monday at 6pm, as I was leaving work in the car and the ABC news came on the radio, I was shocked, stunned, shattered to hear that David Bowie had indeed suddenly passed away. I just burst into tears in the car park (thankfully it was late and nobody was around) and then my wife rang and then messages from friends and colleagues come pouring in and so life post David Bowie began.

It’s funny how so many people have said to me how I was the first person they thought of after they heard the news David Bowie had passed away. Some people contacted me who I haven’t spoken to in years (and thank you, it really means a lot). I guess I wasn’t a closet fan. But that’s because David Bowie for the past 37 odd years has been a very real, important part of my life, one of my best ever friends even though I never actually (quite) met him in person. But I knew him like a best friend through his music and the incredible cultural influence he’s had on my life and those of so many many others. Hey, we were even both born in South London.

While it’s deeply sad that I’ll never again enjoy that new David Bowie album into the future, how lucky am I to have lived at the same time as the great genius that is/was David Bowie. And though he’s now gone, his music, legacy and wonderful memories will live on with me for the rest of my life. He was indeed the soundtrack of my life. The Starman is finally up among the stars.

Thank you David Bowie. RIP.

bowie2

An Index or Disaster, You Choose (It’s The End Of The World As We Know It) October 30, 2015

Posted by Richard Foote in Oracle Indexes, skip_unusable_indexes.
3 comments

This come up recently with a customer and I saw a related question on OTN that made me realize I haven’t previously discussed the SKIP_UNUSABLE_INDEXES parameter.

Let me start by making a very simple point. All it potentially takes to make your entire database effectively inoperable or indeed to maybe bring down a RAC instance is for one teeny weeny index to become unusable. That’s it.

If the unusable/missing index is used often enough on a table that is big enough, the resultant explosion of load on the database generate by 10,000s of (say) massive Full Table Scans can be catastrophic on all users of the database (or indeed on all users of the disk subsystem or database servers as in the case of my customer).

Sometimes the lesser evil would be to simply ensure the statement(s) that were reliant on the now unusable index are prevented from executing and causing general havoc, so that all other database activities can continue unabated.

Hence why we have the SKIP_UNUSABLE_INDEXES parameter. When set to TRUE (the default), the CBO simply ignores the fact there’s an index in an unusable state it would have used and chooses the next best available plan (which might be plan horrendous, such as a FTS on a massive table).

If however SKIP_UNUSABLE_INDEXES is set to FALSE, then an error is generated if an index the CBO would have used is in an unusable state. Yes, such statements don’t now run causing possible frustration for some end users, but at least the “damage” is minimized to just those users, with the rest of the database community blissfully unaware and non-impacted by the unusable index.

To illustrate, a simple little example. Let’s just create a table and populate it with 1M rows. Note that the CODE column has 100,000 distinct values, so 10 rows returned for each value.

SQL> create table ziggy (id number, code number, name varchar2(42), description varchar(42));

Table created.

SQL> insert into ziggy select rownum, mod(rownum,100000), 'ZIGGY STARDUST', 'AND THE SPIDERS FROM MARS' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY');

PL/SQL procedure successfully completed.

Let’s create an index on this column as this would be a really cheap way to find a particular CODE of interest:

SQL> create index ziggy_code_i on ziggy(code);

Index created.

So a query that just selects a CODE of interest is going to be nice and efficient:

SQL> select * from ziggy where code=42;

10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 3294205578

----------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |              |    10 |   510 |    13   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| ZIGGY        |    10 |   510 |    13   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | ZIGGY_CODE_I |    10 |       |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CODE"=42)

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
14  consistent gets
0  physical reads
0  redo size
1325  bytes sent via SQL*Net to client
551  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
10  rows processed

The index is used as expected and at just 14 consistent gets, it burns little CPU and performs no physical IOs.

If however the index were to become unusable …

SQL> alter index ziggy_code_i unusable;

Index altered.

And we now re-run the query:

SQL> select * from ziggy where code=42;

10 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 2421001569

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |    10 |   510 |  2122   (1)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| ZIGGY |    10 |   510 |  2122   (1)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("CODE"=42)
Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
7765  consistent gets
7760  physical reads
0  redo size
965  bytes sent via SQL*Net to client
551  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
10  rows processed

We can see the query is now using the next best execution plan involving a FTS on our “large” table, which is performing many more consistent gets and physical IOs. If this query (or others also dependent on the now unusable index) were executed frequently enough and if the table was big enough, the resultant increase in both CPU and IO could be crippling to the entire database environment(s).

Rather than continually running this now extremely costly query, perhaps the better alternative is to prevent this query from running at all.

By changing the SKIP_UNUSABLE_INDEXES parameter to FALSE:

SQL> alter system set skip_unusable_indexes=false;

System altered.

If we try now to execute this query:

SQL> select * from ziggy where code=42;
select * from ziggy where code=42
*
ERROR at line 1:
ORA-01502: index 'BOWIE.ZIGGY_CODE_I' or partition of such index is in unusable
state

We get our beloved (in this scenario) ORA-01502 error. We now prevent 10,000s of expensive FTS and the general database remains unaffected.

So you have the choice in what to do here. In the event of an unusable index, is there a sufficiently “good enough” alternate option that makes executing statements impacted by the unusable index viable or is the resultant impact on the database just too much and preventing such statements from executing at all the  better option.

It’s potentially a quick fix for a dire database emergency.

Richard Foote: Upcoming Presentation Events (David Live) October 20, 2015

Posted by Richard Foote in Oracle Indexes.
4 comments

I’ll be doing the rounds in Australia/NZ in the coming weeks so plenty of opportunity to catch-up and see me in action:)

I’ll be doing a “National Tour” of my Let’s Talk Oracle Database sessions that I’ve been running locally in Canberra for a number of years. All events have limited places available so please register early to avoid disappointment.

All Let’s Talk Oracle Database events are free, start at 8:30am – 9:00am for registration and coffee, conclude at 12:30pm and are followed by lunch, networking and an informal Q&A. Details of events and dates are:

11 November Brisbane: Let’s Talk Oracle Database: Oracle 12c Database New Features (click here to register). Note: This event only has very few places remaining.

17 November Canberra: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports (email richard.foote@oracle.com to register). Note: This event is now officially FULL.

19 November Sydney: Let’s Talk Oracle Database: Oracle 12c Database New Features (email mitch.foster@oracle.com or click here to register)

25 November Adelaide: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports (email brodie.james@oracle.com to register).

I’ll also be participating on the OTN APAC 2015 Tour with a number of other great presenters. I’ll be presenting at the following events:

30 November Sydney (click here to register)

2 December Perth (click here to register)

4 December Wellington (click here to register)

Summary of the Let’s Talk Oracle Database sessions:

Oracle 12c Database New Features

An in-depth technical look at many of the exciting enhancements and new capabilities introduced in the Oracle 12c Database that help make life easier for the DBA and Oracle Developer. Among a number of topics, we’ll have a look at changes with respect the Cost Based Optimizer, new statistics gathering options, discuss many of the new indexing related capabilities such as Partial Indexes, Advanced Index Compression, Asynchronous Global Index Maintenance, etc. and look at many of those cool little things such as In-Database Archiving, Automatic Data Optimization, Identity Columns, Row-Limiting Clauses, Temporary Undo, Online Move operations, Clustering Attributes, Temporal and Improved Flashback Queries etc. to name but a few that can sometimes make all the difference. Might even squeeze in a bit of Database In-Memory. So plenty to talk about !!!

Performance Diagnostics – How To Correctly Use AWR Reports

I often get called in by customers to determine and address the root cause of database performance issues. Depending on the issue, a request for a simple Automatic Workload Repository (AWR) report is often sufficient to accurately diagnose the root problem(s). However, many DBAs find AWR reports daunting as they contain so much database diagnostic information, they don’t even know where to begin to find the truly useful information.

In this session, I begin by discussing the basic concepts of response times, DB times, the Oracle Wait Interface and how to focus on what is truly important. I then look at how to best read an AWR report to quickly go to the most relevant sections that detail any specific issues. I also discuss a general tuning and diagnostic methodology that ensures one can quickly determine whether an AWR report will indeed be sufficient and how to accurately and consistently use the AWR report to pinpoint and determine root causes for global database performance issues. We’ll go through a number of actual “real-life” examples that highlight various performance issues and how one accurately determines the actual root issues through an appropriate AWR report.

Time permitting, we’ll go through some other useful diagnostic tools such as the AWR Warehouse, Automatic Database Diagnostic Monitor (ADDM), Active Session History (ASH)  and SQL Extended Tracing and scenarios when they can prove to be invaluable in addressing database performance issues.

Hope to see you at one of these events:)

Index Advanced Compression: Multi-Column Index Part II (Blow Out) September 24, 2015

Posted by Richard Foote in Advanced Index Compression, Concatenated Indexes, Index Column Order, Index Rebuild, Oracle Indexes.
add a comment

I previously discussed how Index Advanced Compression can automatically determine not only the correct number of columns to compress, but also the correct number of columns to compress within specific leaf blocks of the index.

However, this doesn’t mean we can just order the columns within the index without due consideration from a “compression” perspective. As I’ve discussed previously, the column order within an index can be very important (especially with regard the use of the index if the leading column of the index is not specified in the SQL), including with regard to the possible compression capabilities of an index.

Advanced Index Compression does not change this and if we order columns inappropriately, one of the consequences can be the index simply can’t be compressed.

To illustrate, back to my simple little example:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,10), 'DAVID BOWIE' from dual connect by level <= 1000000; 1000000 rows created. SQL> commit;

Commit complete.

But this time, I’m going to create the index with the columns the other way around than I had in the previous post. The effectively unique ID column is now the leading column in the index, followed by the CODE column that indeed has many duplicate values. There is a “myth” that suggests this is actually a more “efficient” way to order an index, put the column with most distinct values first in the index. This is of course not true (yes, I’ve covered this one before as well).

SQL> create index bowie_idx on bowie(id, code) pctfree 0;

Index created.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2363          2

So the index without compression has 2363 leaf blocks.

As the leading column is effectively unique, we simply can’t now compress this index effectively. That’s because compression requires there to be duplicate index entries starting with at least the leading column. If the leading column has few (or no) duplicates, then by compressing the index Oracle is effectively creating a prefixed entry within the leaf block for each and every index entry. The whole point of index (or table) compression is to effectively de-duplicate the index values but there’s nothing to de-duplicate if there are no repeating values in at least the leading column of the index.

If we attempt to just compress fully the index anyways:

SQL> alter index bowie_idx rebuild compress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         3115          2

It actually results in a bigger, not smaller index. The leaf blocks has gone up from 2363 to 3115.

Unlike the previous post where the columns in the index were the other way around, if we attempt to just compress the first column, it makes no difference to the inefficiency of the index compression because the number of prefix entries we create remains exactly the same:

SQL> alter index bowie_idx rebuild compress 1;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         3115          2

So the index remains at the higher 3115 leaf blocks.

The good thing with Advanced Index Compression is that we can “give it a go”, but it will not result in a larger index structure. If there’s nothing to compress within a leaf block, Oracle just ignores it and moves on to the next leaf block. If there’s nothing to compress at all within the index, the index remains the same as if it’s not been compressed:

SQL> alter index bowie_idx rebuild compress advanced low;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2363          2

So the index is now back to 2363 leaf blocks, the same as if it wasn’t compressed at all. No it hasn’t helped, but at least it hasn’t made things worse.

So the order of the columns still plays a vital part in the “compress-ability” of the index, even with Index Advanced Compression at your disposal. If both the ID and CODE columns are referenced in your code, then having CODE as the leading column of the index would both improve the manner in which the index can be compressed and make a Skip-Scan index scan viable in the case when the CODE column might not occasionally be specified.

Now, if we change the leading column and create some duplicates (in this case, we update about 10% of the rows to now have duplicate values in the leading ID column):

SQL> update bowie set id=42 where id between 442000 and 542000;

100001 rows updated.

SQL> commit;

Commit complete.

SQL> alter index bowie_idx rebuild nocompress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2338          2

With a whole bunch of IDs with a value of 42, the non-compressed index now has 2338 leaf blocks. Yes, 10% of the leading columns have duplicates, but 90% of the index doesn’t and remains effectively unique. So if we try and compress this index now:

SQL> alter index bowie_idx rebuild compress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2941          2

The compressed index now has 2941 leaf blocks and is still larger than the 2338 non-compressed index. Yes, it’s compressed the 10% of the index that it could, but the inefficiencies in dealing with the other 90% has resulted in an overall larger index. So not too good really.

Again, compressing just the leading ID column doesn’t improve matters:

SQL> alter index bowie_idx rebuild compress 1;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2977          2

In fact, at 2977 it’s even worse than compressing all the index because by compressing both columns, we could also effectively compress the duplicate CODE columns as well within that 10% of the index where we had duplicate ID values. With compressing just the ID column, we don’t get the benefit of compressing the duplicate CODE values. So not very good either.

In either case, compressing the index is ineffective as we end up with a bigger, not smaller index.

But not with Index Advanced Compression:

SQL> alter index bowie_idx rebuild compress advanced low;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2265          2

We now have a index structure at just 2265 leaf blocks that is indeed smaller than the non-compressed index (2338 leaf blocks) because Oracle can now compress just the 10% of index where compression is effective and just ignore the rest of the index (90%) where compression is ineffective.

The best of both worlds, where Index Advanced Compression can compress just the part of an index where it effectively can and ignore and not make matters worse in any parts of the index where index compression is ineffective.

An indexing no-brainer …

Index Advanced Compression: Multi-Column Index Part I (There There) September 17, 2015

Posted by Richard Foote in 12c, Advanced Index Compression, Concatenated Indexes, Index Rebuild, Oracle Indexes.
add a comment

I’ve discussed Index Advanced Compression here a number of times previously. It’s the really cool additional capability introduced to the Advanced Compression Option with 12.1.0.2, that not only makes compressing indexes a much easier exercise but also enables indexes to be compressed more effectively than previously possible.

Thought I might look at a multi-column index to highlight just how truly cool this new feature is in automatically managing the compression of indexes.

First, let’s create a little table and multi-column index:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,10), 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_idx on bowie(code, id) pctfree 0;

Index created.

OK, the key thing to note here is that the leading CODE column in the index only has 10 distinct values and so is repeated very frequently. However, the second ID column is effectively unique such that the index entry overall is also likewise effectively unique. I’ve created this index initially with no compression, but with a PCTFREE 0 to make the non-compressed index as small as possible.

If we look at the size of the index:

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2361          2

We notice the index currently has 2361 leaf blocks.

I’ve previously discussed how index compression basically de-duplicates the indexed values by storing them in a pre-fixed table within the index leaf block. These pre-fixed entries are them referenced in the actual index entries, meaning it’s only now necessary to store repeated values once within a leaf block. Only repeated index values within an index leaf block can therefore be effectively compressed.

In this example, it would be pointless in compressing both indexed columns as this would only result in a unique pre-fixed entry for each any every index entry, given that the ID column is unique. In fact, the overhead of having the pre-fixed table for each and every index entry would actually result in a larger, not small overall index structure.

To show how compressing the whole index would be a really dumb idea for this particular index:

SQL> alter index bowie_idx rebuild compress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         3120          2

The COMPRESS option basically compresses the whole index and we note that rather than creating a smaller, compressed index structure, the index is in fact bigger at 3120 leaf blocks.

However, as the leading CODE column in the index only has 10 distinct values and so is heavily repeated, it would make sense to just compress this first CODE column only in the index. This of course requires us to fully understand the data associated with the index.

We can do this by specifying just how many leading columns to compress (in this case just 1):

SQL> alter index bowie_idx rebuild compress 1;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2002          2

We note the index is indeed smaller than it was originally, now at just 2002 leaf blocks.

So this requires us to make the correct decision in how many columns in the index to compress. Getting this wrong can result in a worse, not better overall index structure.

Now with Advanced Index Compression, we don’t have to make this decision, we can simply let Oracle do it for us. As discussed previously, Oracle can go through each leaf block and decide how to best compress each leaf block. In this case, it can automatically determine that it’s only beneficial to compress the CODE column throughout the index.

If we compress this index with the new COMPRESS ADVANCED LOW clause:

SQL> alter index bowie_idx rebuild compress advanced low;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2002          2

We note we get the index at the nice, small 2002 leaf blocks, as if we used the correct COMPRESS 1 decision.

However, the story gets a little better than this …

Let’s now modify the contents of the table so that we create some duplicates also for the second ID column:

SQL> update bowie set id=42 where id between 442000 and 542000;

100001 rows updated.

SQL> commit;

Commit complete.

OK, so for about 10% of rows, the ID column value is indeed repeated with the value 42. However, for the remaining 90% of rows (and hence index entries), the ID column remains effectively unique. So we have this 10% section of the index where ID is indeed heavily repeated with the value 42, but everywhere else within the index the ID remain unique.

If we rebuild this index again with no compression:

SQL> alter index bowie_idx rebuild nocompress pctfree 0;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2336          2

We now end up with 2336 leaf blocks (a little smaller than before the update as we’re replacing 10% of the IDs with a smaller value of just 42).

However, the vast majority (90%) of the index entries are still unique, so attempting to compress the entire index is again unlikely to be beneficial:

SQL> alter index bowie_idx rebuild compress;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         2946          2

Indeed, the index is again now bigger at 2946 than it was when it wasn’t compressed.

We can again effectively compress just the CODE column in the index:

SQL> alter index bowie_idx rebuild compress 1;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         1977          2

OK, just compressing the CODE column has indeed resulted in a smaller index structure (just 1977 leaf blocks) as it did before.

Without Advanced Index Compression we have the option to not compress the index (the result is average), compress both columns (the result is worse) or compress just the leading column (the result is better). It’s an all or nothing approach to index compression with the best method decided at the overall index level.

We don’t have the option to compress just the leading column when it makes sense to do so, but to also compress both columns in just the 10% portion of the index where it also makes sense to do so (when we have lots of repeating 42 values for ID).

We do have this option though with Advanced Index Compression and indeed this is performed automatically by Oracle in just those leaf blocks where it’s beneficial because the decision on how to compress an index is not performed at the overall index level but at the leaf block level. As such, Advanced Index Compression has the potential to compress an index in a manner that was simply not possible previously:

SQL> alter index bowie_idx rebuild compress advanced low;

Index altered.

SQL> select index_name, leaf_blocks, blevel from dba_indexes where table_name='BOWIE';

INDEX_NAME LEAF_BLOCKS     BLEVEL
---------- ----------- ----------
BOWIE_IDX         1941          2

We notice the index is now even smaller at just 1941 leaf blocks than it was when just compressing the leading column as we now also compress the CODE column in just that 10% of the table where we also had repeating ID values.

I can’t emphasise enough just how cool this feature is !!

In fact, I would recommend something I don’t usually recommend and that is rebuilding all your indexes at least once (where you know the leading column has some repeated values) with the Advanced Index Compression option, so that all indexes can be compressed to their optimal manner.

Note though that this does require the Advanced Compression Option !!

More later:)

Presenting the Hotsos Symposium Training Day – 10 March 2016 (Heat) September 16, 2015

Posted by Richard Foote in Oracle Indexes.
4 comments

I’ve just accepted an invitation to present the Hotsos Symposium Training Day on 10 March 2016 in sunny Dallas, Texas. In the age of Exadata and In-Memory databases, it’ll be an updated and consolidated version of my Index Internals and Best Practices seminar. With an emphasis on using indexes appropriately to boost performance, it’ll feature lots of tips, tricks and references to David Bowie:)

The Hotsos Symposium (6-10 March 2016), with it’s focus on performance analysis is one of the very best conferences I’ve ever presented at and I’m very excited at the prospect of returning next year. The confirmed list of presenters is already impressive: Tanel Põder, Bryn Llewellyn, Kellyn Pot’Vin, Jim Czuprynski, Jeff Smith, Kerry Osborne, Carlos Sierra and now Richard Foote, as I’ll likely present a session or two during the conference proper as well.

If you get the chance, do yourself a favour and join me at one of the best Oracle conferences available and hopefully stay on for the Training Day where there’ll be lots of learning and fun to be had !!

Presenting in Perth on 9 September and Adelaide on 11 September (Stage) August 22, 2015

Posted by Richard Foote in Oracle Indexes.
add a comment

For those of you lucky enough to live on the western half of Australia, I’ll be presenting at a couple of events in both Perth and Adelaide in the coming weeks.

On Wednesday, 9th September 2015, I’ll be presenting on Oracle Database 12c New Features For DBAs (and Developers) at a “Let’s Talk Oracle” event in Perth, with fellow Ex-Oracle ACE Directors Connor McDonald and Chris Muir. Full agenda as follows:

8:30-9:00 Registration and coffee
9:00-10:30 Richard Part I – Database 12c New Features for DBAs (and Developers)
10:30-11:00 Break
11:00-12:30 Richard Part II – Database 12c New Features for DBAs (and Developers)
12:30-1:30 Lunch
1:30-2:30 Chris – Creating RESTful APIs with Oracle Data Services (for Developers and DBAs)
2:30-2:45 Break
2:45-4:15 Connor – Database 12c New Features for Developers (and DBAs)
4:15-5:00 Q&A with the ACES !
5:00         Wrap up

This is a free event to be held at the Oracle Offices in Perth so places are strictly limited (and almost exhausted). It should be a fantastic day, so I hope you can make it. To register, just send an email to dennis.ward@oracle.com

The following morning, I’ll likely run an Indexing Q&A breakfast session for the WAOUG. Further details to come.

On Friday, 11th September 2015, I’ll be presenting a similar session on Oracle Database 12 New Features (For both DBAs and Developers) at the Adelaide Offices between 9am and 12:30pm. Again, places will be strictly limited on a first come first served basis. To register, just send an email to brodie.james@oracle.com.

Hopefully, I’ll see you at one of these events. If not, I’m planning to run similar events in the other capital cities in Australia in the coming weeks, more details to follow soon.

Why A Brand New Index Might Benefit From An Immediate Coalesce (One Slip) July 6, 2015

Posted by Richard Foote in ASSM, Index Block Splits, Index Internals, Insert Append, Oracle Indexes, Tree Dumps, Truncate Indexes.
add a comment

A recent question on the OTN Forums Reg: Index – Gathering Statistics vs. Rebuild got me thinking on a scenario not unlike the one raised in the question where a newly populated index might immediately benefit from a coalesce.

I’ve previously discussed some of the pertinent concepts such as how index rebuilds can make indexes bigger, not smaller and some of the issues around 90-10 block splits not occurring.

Let me show you a scenario where a newly populated index might benefit from an immediate coalesce.

As usual, I start with my little Bowie table and index on the ID column:

SQL> create table bowie (id number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_id_i on bowie(id);

Index created.

If we look at the current statistics on table and index:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> select num_rows, blocks, empty_blocks from dba_tables where table_name='BOWIE';

NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ------------
1000000       3142            0

SQL> select blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_ID_I';

BLEVEL LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
2        2226    1000000

We note the index currently has 2226 leaf blocks. A look at a partial tree dump of the index:

—– begin tree dump
branch: 0x1834723 25380643 (0: nrow: 4, level: 2)
branch: 0x18340b6 25378998 (-1: nrow: 672, level: 1)
leaf: 0x1834724 25380644 (-1: row:485.485 avs:828)
leaf: 0x1834725 25380645 (0: row:479.479 avs:820)
leaf: 0x1834726 25380646 (1: row:479.479 avs:820)
leaf: 0x1834727 25380647 (2: row:479.479 avs:820)
leaf: 0x1834728 25380648 (3: row:479.479 avs:820)
leaf: 0x1834729 25380649 (4: row:479.479 avs:819)
leaf: 0x183472a 25380650 (5: row:479.479 avs:820)
leaf: 0x183472b 25380651 (6: row:479.479 avs:820)
leaf: 0x183472c 25380652 (7: row:479.479 avs:820)
leaf: 0x183472d 25380653 (8: row:479.479 avs:819)
leaf: 0x183472e 25380654 (9: row:479.479 avs:820)
leaf: 0x183472f 25380655 (10: row:479.479 avs:820)

Shows us that the index is fully populated with just the default 10% pctfree of free space (most leaf blocks have an avs of 819/820 free bytes).

The forum question mentions a scenario where 70% of the table is archived away. This is done by storing in a temporary table the 30% of required data, followed by a truncate of the original table and the 30% of data being re-inserted. Something like the following:

SQL> create table bowie_temp as select * from bowie where id > 700000;

Table created.

SQL> truncate table bowie;

Table truncated.

So we store in the temp table all values with an ID > 700000. When a table is truncated, so are all the associated indexes. If we performed a tree dump of the index straight after the truncate:

—– begin tree dump
leaf: 0x1834723 25380643 (0: row:0.0 avs:8000)
—– end tree dump

We can see the index consists now of nothing but an empty leaf block.

If we now re-insert the 30% of data of interest and collect fresh statistics:

SQL> insert into bowie select * from bowie_temp;

300000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> select num_rows, blocks, empty_blocks from dba_tables where table_name='BOWIE';

NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ------------
300000       1000            0

SQL> select blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_ID_I';

BLEVEL LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
2        1112     300000

We note the number of leaf blocks has effectively halved to just 1112 leaf blocks down from 2226 leaf blocks.

But isn’t this index somewhat larger than it should be ? If we removed 70% of data, why has the index just reduced by 50% ?

If we look at a partial tree dump of the index now:

branch: 0x1834723 25380643 (0: nrow: 3, level: 2)
branch: 0x18335de 25376222 (-1: nrow: 332, level: 1)
leaf: 0x1834727 25380647 (-1: row:255.255 avs:3922)
leaf: 0x18004f6 25167094 (0: row:255.255 avs:3923)
leaf: 0x18004f2 25167090 (1: row:255.255 avs:3922)
leaf: 0x18004f3 25167091 (2: row:255.255 avs:3923)
leaf: 0x18004f4 25167092 (3: row:255.255 avs:3922)
leaf: 0x1800505 25167109 (4: row:449.449 avs:821)
leaf: 0x18004f5 25167093 (5: row:246.246 avs:4066)
leaf: 0x18004f1 25167089 (6: row:246.246 avs:4067)
leaf: 0x1834724 25380644 (7: row:500.500 avs:5)
leaf: 0x1834725 25380645 (8: row:500.500 avs:5)
leaf: 0x1834726 25380646 (9: row:500.500 avs:5)
leaf: 0x18004f7 25167095 (10: row:500.500 avs:5)
leaf: 0x18004f0 25167088 (11: row:255.255 avs:3922)
leaf: 0x1833d8c 25378188 (12: row:255.255 avs:3923)
leaf: 0x18331ed 25375213 (13: row:255.255 avs:3922)
leaf: 0x18331fd 25375229 (14: row:255.255 avs:3923)
leaf: 0x18331fa 25375226 (15: row:255.255 avs:3922)
leaf: 0x18331c8 25375176 (16: row:255.255 avs:3923)
leaf: 0x18331c9 25375177 (17: row:253.253 avs:3954)
leaf: 0x18331cd 25375181 (18: row:255.255 avs:3923)
leaf: 0x18331d4 25375188 (19: row:255.255 avs:3923)
leaf: 0x18331e0 25375200 (20: row:255.255 avs:3922)

We notice that vast areas of the index now has 50% of free space, not the default 10% it had previously.

The “problem” here is that when data was stored in the temp table, there was nothing to guarantee that the data be physically stored in the same order as the ID column. ASSM tablespaces will effectively pick random free blocks below the high water mark of the table so that although data might well be ordered within a block, the blocks within the table might not be logically ordered with respect to the data being inserted.

A simple select from the temp table displaying the “first” 20 rows of the table will illustrate this:

SQL> select * from bowie_temp where rownum <=20;

ID NAME
---------- ------------------------------------------
701717 DAVID BOWIE
701718 DAVID BOWIE
701719 DAVID BOWIE
701720 DAVID BOWIE
701721 DAVID BOWIE
701722 DAVID BOWIE
701723 DAVID BOWIE
701724 DAVID BOWIE
701725 DAVID BOWIE
701726 DAVID BOWIE
701727 DAVID BOWIE
701728 DAVID BOWIE
701729 DAVID BOWIE
701730 DAVID BOWIE
701731 DAVID BOWIE
701732 DAVID BOWIE
701733 DAVID BOWIE
701734 DAVID BOWIE
701735 DAVID BOWIE
701736 DAVID BOWIE

Note that the first selected range of values starts with 701717 and not 700001.

Therefore, when the data is loaded back within the table, the data is not necessarily ordered as it was previously and an insert into a full leaf block might not necessarily have the largest ID column currently within the table/index. That being the case, a 50-50 block split is performed rather than the 90-10 block splits that only occur when it’s the largest indexed value being inserted into a full leaf block. 90-10 block splits leaves behind nice full leaf blocks, but 50-50 block splits leaves behind 1/2 emptied blocks that don’t get filled if they don’t house subsequent inserts.

The 50-50 leaf block split is resulting in a larger index but most importantly, these areas of free space are not going to be used by subsequent monotonically increasing ID index values.

To reclaim this effectively unusable index storage, the index would benefit from an immediate coalesce.

Of course, the best cure would be to prevent this scenario from occurring in the first place.

One option would be to insert the data in a manner that ensures the effective ordering of the indexed data:

SQL> truncate table bowie;

Table truncated.

SQL> insert into bowie select * from bowie_temp order by id;

300000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> select num_rows, blocks, empty_blocks from dba_tables where table_name='BOWIE';

NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ------------
300000       1000            0

SQL> select blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_ID_I';

BLEVEL LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
1         600     300000

We notice by inserting the data in ID order within the table (via the “order by” clause), we guarantee that an insert into a full leaf block will indeed be as a result of the highest current index entry and that 90-10 block splits are performed. This leaves behind perfectly full leaf blocks and a nice, compact index structure. The index now only has 600 leaf blocks, significantly less than before with no “wasted” storage.

A partial tree dump of the index highlights this:

—– begin tree dump
branch: 0x1834723 25380643 (0: nrow: 600, level: 1)
leaf: 0x1834727 25380647 (-1: row:500.500 avs:5)
leaf: 0x1834724 25380644 (0: row:500.500 avs:5)
leaf: 0x1834725 25380645 (1: row:500.500 avs:5)
leaf: 0x1834726 25380646 (2: row:500.500 avs:5)
leaf: 0x18004f7 25167095 (3: row:500.500 avs:5)
leaf: 0x18004f0 25167088 (4: row:500.500 avs:5)
leaf: 0x18004f1 25167089 (5: row:500.500 avs:5)
leaf: 0x18004f5 25167093 (6: row:500.500 avs:5)
leaf: 0x18004f6 25167094 (7: row:500.500 avs:5)
leaf: 0x18004f2 25167090 (8: row:500.500 avs:5)
leaf: 0x18004f3 25167091 (9: row:500.500 avs:5)
leaf: 0x18004f4 25167092 (10: row:500.500 avs:5)

The index leaf blocks are chock-a-block full with just 5 bytes free, not enough space for another index entry.

Another alterative would of course be to make the indexes unusable before re-inserting data into the data and rebuild the index later with a pctfree of 0, a safe and appropriate value for monotonically increasing values. This has the added advantage of significantly improving the performance of the bulk insert operation.

If the data isn’t monotonically increasing, then 50-50 block splits are fine as the resultant free space would indeed be effectively used.

Update: 7 July 2015

And as Jonathan Lewis kindly reminded me, another method to avoid this issue is to simply use an insert Append. This will record all the key entries as it goes, sort them and populate the index much more efficiently in one step:

SQL> truncate table bowie;

Table truncated.

SQL> insert /*+ append */ into bowie select * from bowie_temp;

300000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.

SQL> select num_rows, blocks, empty_blocks from dba_tables where table_name='BOWIE';

NUM_ROWS     BLOCKS EMPTY_BLOCKS
---------- ---------- ------------
300000        936            0

SQL> select blevel, leaf_blocks, num_rows from dba_indexes where index_name='BOWIE_ID_I';

BLEVEL LEAF_BLOCKS   NUM_ROWS
---------- ----------- ----------
1         600     300000

So at 600 leaf blocks, the index is again populated within a fully compacted index structure.

Follow

Get every new post delivered to your Inbox.

Join 2,447 other followers