Storing Date Values As Characters Part II (A Better Future) May 30, 2016
Posted by Richard Foote in 12c, CBO, Function Based Indexes, Oracle Indexes, Storing Dates as Characters.trackback
In the previous post, I discussed how storing date values within a character data type is a really really bad idea and illustrated how the CBO can easily get its costings totally wrong as a result. A function-based date index helped the CBO get the correct costings and protect the integrity of the date data.
During the demo, I re-collected statistics on the table as the associated hidden virtual column after creating the function-based index doesn’t have statistics.
Before re-collecting statistics:
SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='BOWIE';</pre> COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM HID VIR ------------ ------------ ---------- --------------- --- --- SYS_NC00004$ NONE YES YES BOWIE_DATE 10000 .0001 NONE NO NO CODE 1000 .001 NONE NO NO ID 1000000 .000001 NONE NO NO
And afterwards:
SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE'); PL/SQL procedure successfully completed. SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='BOWIE'; COLUMN_NAME NUM_DISTINCT DENSITY HISTOGRAM HID VIR ------------ ------------ ---------- --------------- --- --- SYS_NC00004$ 10000 .0001 NONE YES YES BOWIE_DATE 10000 .0001 HYBRID NO NO CODE 1000 .001 NONE NO NO ID 1000000 .000001 NONE NO NO
We can see that the hidden virtual column now has statistics.
But we also notice another difference, that being the BOWIE_DATE column now has a histogram (of type Hybrid).
As discussed in the previous post, the issue here is that the date data within the character column covers only a very specific subset of all the potential character values that could reside within the column. Therefore the CBO is getting the range scan selectivity hopelessly incorrect.
Now that we’ve run a few queries featuring the BOWIE_DATE column in the predicates and as there’s effectively data skew within the column, the column becomes a candidate for a histogram with the default SIZE AUTO collection method.
The histogram now provides the CBO with a much more accurate picture of the distribution of the data within the BOWIE_DATE and that between discrete “date” column values, there are only so many rows that qualify.
As a result of the histogram, the CBO can now make much more accurate cardinality estimates.
If we now re-run the query that actually returns 8300 rows but the CBO previously estimated only 100 rows be returned:
SQL> select * from bowie where bowie_date between '2015 10 10' and '2015 12 31' 8300 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1845943507 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 4152 | 83040 | 1000 (12) | 00:00:01 | |* 1 | TABLE ACCESS FULL | BOWIE | 4152 | 83040 | 1000 (12) | 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("BOWIE_DATE">='2015 10 10' AND "BOWIE_DATE"<='2015 12 31') Statistics ---------------------------------------------------------- 14 recursive calls 0 db block gets 4063 consistent gets 0 physical reads 0 redo size 282075 bytes sent via SQL*Net to client 6635 bytes received via SQL*Net from client 555 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 8300 rows processed
We see that at an estimated 4,152 rows, it’s a much better estimate. Not perfect, but maybe good enough to now get the more efficient Full Table Scan execution plan.
If we re-run the query that returned over 1/2 the table at some 570,000 rows but with the CBO previously estimating only 116 rows:
SQL> select * from bowie where bowie_date between '2000 10 10' and '2016 12 31';</pre> 570800 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1845943507 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU) | Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 572K | 10M | 1012 (13) | 00:00:01 | |* 1 | TABLE ACCESS FULL | BOWIE | 572K | 10M | 1012 (13) | 00:00:01 | --------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("BOWIE_DATE">='2000 10 10' AND "BOWIE_DATE"<='2016 12 31') Statistics ---------------------------------------------------------- 14 recursive calls 0 db block gets 41456 consistent gets 4 physical reads 0 redo size 19292352 bytes sent via SQL*Net to client 419135 bytes received via SQL*Net from client 38055 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 570800 rows processed
We see that at an estimate of 572K rows, it’s now got this just about right and again has made the right decision with the Full Table Scan execution plan.
Storing date data in character based columns is still a really really bad idea and limits the manner in which date data can be analysed, protected and accessed, but with appropriate histograms in place, at least the CBO has some chance of making a reasonable fist of things with some range based queries.
As a follow-up, I was asked if storing dates in NUMBER format is a better option than as a string. I’ll discuss that next.
[…] trackback […]
LikeLike
Buen día. Tengo 48 años y me hayo estudiando inglés hará unos 4 años.
Este post me ha parecido épico para la memorización del inglés.
Me ha sido muy servicial. Lo voy a añadir a mis favoritos.
Le doy las gracias por el artículo.
LikeLike
hi!,I really like your writing so much! share we communicate more approximately your post on AOL?
I need an expert on this area to solve my problem. Maybe that’s you!
Having a look ahead to look you.
LikeLike
express Employment staffing Services
Storing Date Values As Characters Part II (A Better Future) | Richard Foote
LikeLike
While that may seem just like an extended stretch of time, you can always break
this hour up into 10-minute increments yet still reap the profits.
Fish oil may be proven to improve your (HDL) levels around eight percent based on studies conducted in the University of Milano.
This way you already know whether your levels are healthy
or if they desire improvement to keep your body within the best health possible.
LikeLike