Next Round Of ANZ “Let’s Talk Database” Events (July/August 2016) June 20, 2016
Posted by Richard Foote in Let's Talk Database, Oracle Indexes.11 comments
I’ll be presenting the next round of “Let’s Talk Database” events around Australia and NZ this winter in July/August 2016. These are free events but due to limited places have often “sold out” in the past, so booking early is recommended to avoid disappointment.
All events run between 9:00am – 12:30pm and are followed by a networking lunch.
We always have a fun time at these events and feedback suggest everyone learns something, so I hope if you’re lucky enough to be in the ANZ region, you can make it to your local event 🙂
Currently, the confirmed events and topics are as follows:
- Sydney – Wednesday, 20 July 2016: Let’s Talk Oracle Database: Intro To CBO Costings, Database Futures. To register, email: mitch.foster@oracle.com
- Melbourne – Thursday, 21 July 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: johanne.sergeant@oracle.com SOLD OUT !!
- Canberra – Thursday, 28 July 2016: Let’s Talk Database: Oracle Database 12c – Built for Data Warehousing. To register, email: richard.foote@oracle.com
- Wellington – Wednesday, 10 August 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: tanya.fea@oracle.com
- Auckland – Thursday, 11 August 2016: Let’s Talk Oracle Database: Performance Diagnostics – How To Correctly Use AWR Reports. To register, email: tanya.fea@oracle.com
- Perth – Wednesday, 17 August 2016: Let’s Talk Database: Intro To CBO Costings, Database Futures. To register, email: dennis.ward@oracle.com
- Adelaide – Thursday, 18 August 2016: Let’s Talk Database: Oracle Database 12c – Built for Data Warehousing. To register, email: brodie.james@oracle.com
- Brisbane: Wednesday, 24 August 2016: Let’s Talk Oracle Database: Intro To CBO Costings, Database Futures. To register, email: gordon.laing@oracle.com
Details of sessions.
Performance Diagnostics – How To Correctly Use AWR Reports
I often get called in by customers to determine and address the root cause of database performance issues. Depending on the issue, a request for a simple Automatic Workload Repository (AWR) report is often sufficient to accurately diagnose the root problem(s). However, many DBAs find AWR reports daunting as they contain so much database diagnostic information, they don’t even know where to begin to find the truly useful information.
In this session, I begin by discussing the basic concepts of response times, DB times, the Oracle Wait Interface and how to focus on what is truly important. I then look at how to best read an AWR report to quickly go to the most relevant sections that detail any specific issues. I also discuss a general tuning and diagnostic methodology that ensures one can quickly determine whether an AWR report will indeed be sufficient and how to accurately and consistently use the AWR report to pinpoint and determine root causes for global database performance issues. We’ll go through a number of actual “real-life” examples that highlight various performance issues and how one accurately determines the actual root issues through an appropriate AWR report.
Time permitting, we’ll go through some other useful diagnostic tools such as the AWR Warehouse, Automatic Database Diagnostic Monitor (ADDM), Active Session History (ASH) and SQL Extended Tracing and scenarios when they can prove to be invaluable in addressing database performance issues.
Introduction Into The CBO (Why Is The CBO doing a Full Table Scan?)
A common issue is confusion on why the Oracle Cost Based Optimizer (CBO) has made the decision the use a full table scan or use that index and not the one that appears more appropriate. The costing figures that the CBO derives and displays in execution plans are often viewed as cryptic, which is unfortunate as these costings are key to CBO decisions. Often the attempt to resolve such issues is to set inappropriate database parameters that introduces as many issues as it tries to resolve. This session looks under the covers of the CBO to see how it precisely derives its various costings and hence how to correctly troubleshoot and resolve inefficient queries when they arise. Bring a calculator, it might come in handy 🙂
Database Futures – Exadata Secret Sauce and Software on Silicon
Oracle has recently released the new generation of the Exadata machine, the X6-2. Not only is it bigger and faster but also has a host of features and capabilities that is unique to the Exadata storage platform including Smart Scans, Storage Indexes, Exafusion, Smart Fusion Block Transfer, Dual Format Columnar Flash Cache, I/O Latency Capping, Hybrid Columnar Compression, Sparse Snapshots and In-Memory Database Fault Tolerance. All these unique features and their enhancements with the new X6-2 will be discussed. Additionally, Oracle’s Software-in-Silicon features extend the concepts of co-engineering hardware and software inherent in engineered systems such as the Exadata. Embedding software which has traditionally sat in the database layer into the hardware layer can yield both performance and security beyond that available of a traditional CPU. This presentation also covers how Oracle Database 12c can leverage Software-in-Silicon features, where the Oracle Database is heading in the future and why you should care.
Oracle Database 12c – Built for Data Warehousing
The Oracle Database is the leading database in market but it might come as a surprise to some that it’is also the leading database with respect to Data Warehousing in terms of both sales and analyst rankings. The focus today is a detailed look at all the special database capabilities that makes the Oracle Database the perfect platform for Data Warehouse type applications.
We explore the overall Data Warehousing Information Management architecture and how it relates to both data warehouses and data reservoirs. We then deep dive into standard database features such as Materialized Views, Dimensions, Bitmap Indexes, Query Re-Write, JSON in the database, Partitioning, Parallel Processing, Compression, Attribute Clustering and SQL for analysis, reporting, aggregation and pattern matching.
We then discuss more advanced Oracle database capabilities such as Data Mining, Predictive Analytics, R in the database, Database In-Memory, Big Data SQL, Spatial & Graph and the specific engineered systems capabilities that makes Oracle such an effective Data Warehouse platform, especially when combined with the enterprise capabilities already associated with the Oracle Database such as security, high availability, scalability and manageability.
Storing Date Values As Numbers (The Numbers) June 1, 2016
Posted by Richard Foote in 12c, CBO, Histograms, Oracle Indexes, Storing Dates As Numbers.10 comments
In my last couple of posts, I’ve been discussing how storing date data in a character based column is a really really bad idea.
In a follow-up question, I was asked if storing dates in NUMBER format was a better option. The answer is that it’s probably an improvement from storing dates as strings but it’s still a really really bad idea. Storing dates in DATE format is easily the best option as is storing any data in its native data type.
In this post, I’ll highlight a few of the classic issues with storing dates in basic number format as well as showing you some of the calculations on the CBO cardinality estimates.
As usual, the demo starts with a basic little table that I’ll populate with date data stored in a NUMBER column (ZIGGY_DATE):
SQL> create table ziggy (id number, code number, ziggy_date number); Table created. SQL> insert into ziggy select rownum, mod(rownum,1000), to_number(to_char(sysdate-mod(rownum,10000), 'YYYYMMDD')) from dual connect by level <=1000000; 1000000 rows created. SQL> commit; Commit complete.
We’ll now collect statistics on the table:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY'); PL/SQL procedure successfully completed. SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='ZIGGY'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM HID VIR ----------- ------------ ---------- --------------- --- --- ZIGGY_DATE 10000 .0001 NONE NO NO CODE 1000 .001 NONE NO NO ID 1000000 .000001 NONE NO NO
So the ZIGGY_DATE column has 10,000 distinct dates (with 100 rows per distinct date), with a column density of 1/10000 = 0.0001.
Let’s now create a standard B-Tree index on the ZIGGY_DATE column:
SQL> create index ziggy_date_i on ziggy(ziggy_date); Index created.
If we look a sample of the data in the column and the min/max date ranges:
SQL> select * from ziggy where rownum <11; ID CODE ZIGGY_DATE ---------- ---------- ---------- 776 776 20140412 777 777 20140411 778 778 20140410 779 779 20140409 780 780 20140408 781 781 20140407 782 782 20140406 783 783 20140405 784 784 20140404 785 785 20140403 SQL> select min(ziggy_date) min, max(ziggy_date) max from ziggy; MIN MAX ---------- ---------- 19890110 20160527
We see that all the data in the ZIGGY_DATE column are just number representations of dates, with a range between 10 Jan 1989 and 27 May 2016.
Note there are actually 10,000 days between the dates but the CBO would estimate a range of 270,417 possible days (20160527 – 19890110 = 270,417). The CBO has no idea that the “numbers” within the column are all dates and that there are ranges of values in which data is relatively popular (e.g. between say 20160101 and 20160131) and ranges of values in which data is relatively unpopular (e.g. say between 20154242 and 20159999).
Although not as bad as the range of possible unpopular values found within a character data type as I discussed previously when storing date data as a string, there is still enough data skew when storing dates as numbers to be problematic to the CBO.
If we select just one date with an equality predicate:
SQL> select * from ziggy where ziggy_date = 20150613; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2700236208 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 1500 | 103 (0) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY | 100 | 1500 | 103 (0) | 00:00:01 | |* 2 | INDEX RANGE SCAN | ZIGGY_DATE_I | 100 | | 3 (0) | 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ZIGGY_DATE"=20150613) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 110 consistent gets 0 physical reads 0 redo size 3883 bytes sent via SQL*Net to client 618 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
The CBO gets things spot on, correctly estimating 100 rows to be returned, as the CBO knows there are only 10,000 distinct values of which only one of those values is being selected.
Selectivity is basically the density of the column = 1/10000 = 0.0001, so the estimated cardinality is 0.0001 x 1M rows = 100 rows. Perfect.
However, if we perform a range based query as follows:
SQL> select * from ziggy where ziggy_date between 20151010 and 20151111; 3300 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2700236208 ---------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ---------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 573 | 8595 | 580 (1) | 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY | 573 | 8595 | 580 (1) | 00:00:01 | |* 2 | INDEX RANGE SCAN | ZIGGY_DATE_I | 573 | | 4 (0) | 00:00:01 | ---------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("ZIGGY_DATE">=20151010 AND "ZIGGY_DATE"<=20151111) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 3531 consistent gets 0 physical reads 0 redo size 108973 bytes sent via SQL*Net to client 2961 bytes received via SQL*Net from client 221 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3300 rows processed
The CBO has got things somewhat incorrect in this example and has underestimated the expect number of rows (573 rows vs. the 3,300 rows actually returned).
The actual number of days between these dates is 33 so the actual ratio of data returned is 33/10000 x 1M rows = 3,300 rows. This is a range of “numbers” that overall covers a relatively “popular” range of date values.
However Oracle is estimating a range of some 20151111 – 20151010 = 101 days between these dates. As the total range of possible days 20160527-19890110 = 270,417, the estimated ratio of returned rows is 101/270417 plus 2 x selectivity of a day for the implicit 2 equality conditions (as a between is effectively >= and <=). The selectivity of one day is just the density of the column, 0.0001 as illustrated in the previous query.
Therefore, the query selectivity is derived as being (101/270417) + (2 x 0.0001) = 0.000573 when multiplied by 1M rows = 573 rows as estimated by the CBO.
So the CBO is rather significantly *under* estimating the rows to be returned which could result in a sub-optimal execution plan (such as the inappropriate use of an index range scan as in this example, noting the poor clustering of the data).
If we now look at another range scan below:
SQL> select * from ziggy where ziggy_date between 20151225 and 20160101; 800 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2421001569 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 33023 | 483K | 810 (15) | 00:00:01 | |* 1 | TABLE ACCESS FULL | ZIGGY | 33023 | 483K | 810 (15) | 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("ZIGGY_DATE">=20151225 AND "ZIGGY_DATE"<=20160101) Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 2824 consistent gets 0 physical reads 0 redo size 23850 bytes sent via SQL*Net to client 1135 bytes received via SQL*Net from client 55 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 800 rows processed
The actual number of days between these dates is only 8 so the actual ratio of data returned is 8/10000 x 1M rows = 800 rows. This is a range of “numbers” that overall covers a relatively “unpopular” range of date values.
However Oracle is estimating a range of some 20160101 – 20151225 = 8876 days between these dates. As the total range of possible days is 20160527-19890110 = 270,417, the estimated ratio of returned rows is 8876/270417 plus 2 x the selectivity of a single day again for the 2 implicit equality conditions.
Therefore, the query selectivity is derived as being (8876/270417) + (2 x 0.0001) = 0.033023 when multiplied by 1M rows = 33,023 rows as estimated by the CBO.
So the CBO is rather significantly *over* estimating the rows to be returned which could again result in a sub-optimal execution plan (or the inappropriate use of a Full Table Scan in this example). The CBO is simply not picking up the fact that most of the possible values between the “number” ranges aren’t valid dates and can’t possibly exist.
Of course, having dates stored as simple numbers means Oracle has no way of ensuring data integrity and can allow “invalid” dates to be inserted:
SQL> insert into ziggy values (1000001, 42, 20160599); 1 row created. SQL> rollback; Rollback complete.
As with dates stored as strings, we can again address these issues by either collecting histograms for such columns and/or by creating a function-based date index on the column:
SQL> create index ziggy_date_fn_i on ziggy(to_date(ziggy_date,'YYYYMMDD')); Index created. SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY'); PL/SQL procedure successfully completed. SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='ZIGGY'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM HID VIR ------------ ------------ ---------- --------------- --- --- SYS_NC00004$ 10000 .0001 NONE YES YES ZIGGY_DATE 10000 .0001 HYBRID NO NO CODE 1000 .001 NONE NO NO ID 1000000 .000001 NONE NO NO
The associated query with the equality predicate has accurate estimates as it did previously:
SQL> select * from ziggy where to_date(ziggy_date, 'YYYYMMDD') = '13-JUN-2015'; 100 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 945728471 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100 | 2300 | 103 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY | 100 | 2300 | 103 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | ZIGGY_DATE_FN_I | 100 | | 3 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')=TO_DATE(' 2015-06-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 111 consistent gets 0 physical reads 0 redo size 2877 bytes sent via SQL*Net to client 618 bytes received via SQL*Net from client 8 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 100 rows processed
As the virtual column created for the function-based index also has 10,000 distinct values and a corresponding density of 0.0001, the CBO is getting the cardinality estimate of 100 rows spot on.
But importantly, both associated range based queries are now also being accurately costed by the CBO as it now knows the data being searched is date based and hence can more accurately determine the actual expected dates to be returned within the specified “date” ranges.
SQL> select * from ziggy where to_date(ziggy_date, 'YYYYMMDD') between '10-OCT-2015' and '11-NOV-2015'; 3300 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 2421001569 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 3400 | 78200 | 1061 (35) | 00:00:01 | |* 1 | TABLE ACCESS FULL | ZIGGY | 3400 | 78200 | 1061 (35) | 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')>=TO_DATE(' 2015-10-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')<=TO_DATE(' 2015-11-11 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) Statistics ---------------------------------------------------------- 8 recursive calls 0 db block gets 2991 consistent gets 0 physical reads 0 redo size 95829 bytes sent via SQL*Net to client 2961 bytes received via SQL*Net from client 221 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 3300 rows processed
The CBO is now estimating not 573 rows, but 3,400 rows which is much closer to the actual 3,300 rows being returned. As a result, the CBO is now performing a more efficient Full Table Scan (due to the poor Clustering Factor of the index) than the Index Range Scan performed previously.
If we look at the other range scan query:
SQL> select * from ziggy where to_date(ziggy_date, 'YYYYMMDD') between '25-DEC-2015' and '01-JAN-2016'; 800 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 945728471 ------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | ------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 900 | 20700 | 909 (1)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY | 900 | 20700 | 909 (1)| 00:00:01 | |* 2 | INDEX RANGE SCAN | ZIGGY_DATE_FN_I | 900 | | 5 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access(TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')>=TO_DATE(' 2015-12-25 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')<=TO_DATE(' 2016-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) Statistics ---------------------------------------------------------- 8 recursive calls 0 db block gets 861 consistent gets 7 physical reads 0 redo size 18917 bytes sent via SQL*Net to client 1135 bytes received via SQL*Net from client 55 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 800 rows processed
The CBO is now estimating not 33023 rows, but 900 rows which is again much closer to the actual 800 rows being returned. As a result, the CBO is now performing a more efficient Index Range Scan than the Full Table Scan is was previously.
And of course, the database via the function-based date index now has a manner in which protect the integrity of the date data:
SQL> insert into ziggy values (1000001, 42, 20160599); insert into ziggy values (1000001, 42, 20160599) * ERROR at line 1: ORA-01847: day of month must be between 1 and last day of month
However, the best way in Oracle to store “Date” data is within a Date data type column …