jump to navigation

Hotsos Symposium 2017: Feb 27 – Mar 2 Dallas, Texas (I’m Afraid of Americans) February 6, 2017

Posted by Richard Foote in Oracle Indexes.
3 comments

sym17_logo

In just 3 weeks time, I’ll again have the pleasure of presenting at the awesome Hotsos Symposium in Dallas, Texas. It’s simply the best conference with regard to Oracle performance tuning anywhere.

This year, I’ll be presenting on a couple of topics:

New Index Features Introduced in Oracle Database 12c Release 2

Oracle Database 12c Release 2 is now available in both Oracle DBaaS and Exadata Express Cloud Services and will soon be generally available. Oracle 12.2 has introduced a number of extremely useful new indexing features and enhancements. These include cool capabilities such as tracking index usage, advanced index compression enhancements, deferred invalidation of cursors during index creation/rebuild, automatic index maintenance during new online operations and JSON indexing enhancements. These will all be discussed in much detail with examples on how they can be usefully deployed.

Real World Examples on how to use AWR Reports to Solve Performance Issues

I often get called in by customers to determine and address the root cause of database performance issues. Depending on the issue, a request for a simple Automatic Workload Repository (AWR) report is often sufficient to accurately diagnose the root problem(s). However, many DBAs find AWR reports daunting as they contain so much database diagnostic information, they don’t even know where to begin to find the truly useful information.

In this session, I begin by discussing the basic concepts of response times, DB times, the Oracle Wait Interface and how to focus on what is truly important. I then look at how to best read an AWR report to quickly go to the most relevant sections that detail any specific issues. I also discuss a general tuning and diagnostic methodology that ensures one can quickly determine whether an AWR report will indeed be sufficient and how to accurately and consistently use the AWR report to pinpoint and determine root causes for global database performance issues. We’ll go through a number of actual “real-life” examples that highlight various performance issues and how one accurately determines the actual root issues through an appropriate AWR report.

 

Looking forward to catching up with a number of my Oracle friends. Hopefully you can make it, learning lots of really useful stuff in relation to Oracle Performance Tuning is guaranteed ūüôā

12.2 Index Advanced Compression “High” Part III (I Can’t Give Everything Away) January 25, 2017

Posted by Richard Foote in 12c Rel 2, 12c Release 2 New Features, Advanced Index Compression, Oracle Indexes.
1 comment so far

If you like the look of the new Index Advanced Compression setting of “High” available with Oracle Database 12c Release 2 as I’ve discussed previously in Part I and Part II, well there’s a way to make this the default index compression method in your 12.2 database.

Let’s begin by creating a table and explicitly creating a NOCOMPRESS index in the BOWIE tablespace:


SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,10), 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_code_idx on bowie(code) tablespace bowie nocompress;

Index created.

If we look at the size and compression type of this index:


SQL> select index_name, tablespace_name, leaf_blocks, compression
from dba_indexes where index_name='BOWIE_CODE_IDX';

INDEX_NAME      TABLESPACE_NAME      LEAF_BLOCKS COMPRESSION
--------------- -------------------- ----------- -------------
BOWIE_CODE_IDX  BOWIE                       1939 DISABLED

We notice the index has 1939 leaf blocks and that index compression is indeed disabled as expected.

Let’s now drop the index and recreate again it in the BOWIE tablespace, but this time without explicitly stating any compression option:


SQL> drop index bowie_code_idx;

Index dropped.

SQL> create index bowie_code_idx on bowie(code) tablespace bowie;

Index created.

SQL> exec dbms_stats.gather_index_stats(ownname=>null, indname=>'BOWIE_CODE_IDX');

PL/SQL procedure successfully completed.

If we look at the index now:


SQL> select index_name, tablespace_name, leaf_blocks, compression from dba_indexes
where index_name='BOWIE_CODE_IDX';

INDEX_NAME      TABLESPACE_NAME      LEAF_BLOCKS COMPRESSION
--------------- -------------------- ----------- -------------
BOWIE_CODE_IDX  BOWIE                        355 ADVANCED HIGH

We notice the index now only has 355 leaf blocks¬†(down from 1939 leaf blocks) and that it has automatically used the new index advanced compression option of “HIGH”.

The secret lies with the following new settings.

Firstly, with the new db_index_compression_inheritance parameter, you can specify how during index creation the index inherits its index compression attributes (tablespace or table or not at all):


SQL> alter system set db_index_compression_inheritance=tablespace scope=both;

System altered.

Secondly, you can give a tablespace an index compression attribute on how indexes are compressed by default within the tablespace:


SQL> alter tablespace bowie default index compress advanced high;

Tablespace altered.

SQL> select tablespace_name, def_index_compression, index_compress_for
from dba_tablespaces where tablespace_name = 'BOWIE';

TABLESPACE_NAME      DEF_INDE INDEX_COMPRES
-------------------- -------- -------------
BOWIE                ENABLED  ADVANCED HIGH

So in this database, all indexes created within the BOWIE tablespace are automatically created with index advanced compression set to HIGH.

There are however some disadvantages with high index advanced compression that need to be considered.

More to come.

12.2 Index Advanced Compression “High” Part II (One Of My Turns) December 12, 2016

Posted by Richard Foote in 12c Rel 2, 12c Release 2 New Features, Advanced Index Compression, Oracle Indexes.
4 comments

In Part I, I introduced the new Index Advanced Compression default value of “HIGH”, which has the potential to significantly compress indexes much more than previously possible. This is due to new index compression algorithms that do more than simply de-duplicate indexed values within a leaf block.

Previously, any attempt to completely compress a Unique Index was doomed to failure as a Unique Index by definition only has unique values and so has nothing to de-duplicate. As such, you were previously restricted (quite rightly) to only being able to compress n-1 columns within a Unique Index. An attempt compress all columns in a Unique Index would only result in a larger index structure due to the associated overheads of the prefix-table within the leaf blocks.

But¬†what happens if¬†we now use Index Advanced Compression set to “HIGH” on a Unique Index ?

Let’s see.

Let’s first create a simple table with a unique ID column:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, rownum, 'ZIGGY STARDUST' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

Let’s start by creating an uncompressed unique index on the ID column:

SQL> create unique index bowie_id_i on bowie(id);

Index created.

SQL> select index_name, leaf_blocks, compression from user_indexes where index_name='BOWIE_ID_I';

INDEX_NAME   LEAF_BLOCKS COMPRESSION
------------ ----------- -------------
BOWIE_ID_I          2088 DISABLED

So the uncompressed unique index has 2088 leaf blocks.

If we try and use normal compression on the index:

SQL> alter index bowie_id_i rebuild compress;
alter index bowie_id_i rebuild compress
*
ERROR at line 1:
ORA-25193: cannot use COMPRESS option for a single column key

We get an error saying we’re not allowed to compress a single column unique index. Doing so makes no sense, as there’s no benefit in de-duplicating¬†such an¬†index.

If we attempt to use advanced index compression with a value of “LOW”:

SQL> alter index bowie_id_i rebuild compress advanced low;
alter index bowie_id_i rebuild compress advanced low
*
ERROR at line 1:
ORA-25193: cannot use COMPRESS option for a single column key

We get the same error. Although advanced index compression of LOW is clever enough to automatically compress only those leaf blocks where there is a benefit¬†in compression,¬†there can be no such index leaf block that benefits from compression via the de-duplication method. Therefore, the error is really there to just let you know that you’re wasting your time in attempting to do this on a unique index.

If however we use the new HIGH option with index advanced compression:

SQL> alter index bowie_code_i rebuild compress advanced high;

Index altered.

SQL> exec dbms_stats.gather_index_stats(ownname=>null, indname=>'BOWIE_ID_I');

PL/SQL procedure successfully completed.

SQL> select index_name, leaf_blocks, compression from user_indexes where index_name='BOWIE_ID_I';

INDEX_NAME   LEAF_BLOCKS COMPRESSION
------------ ----------- -------------
BOWIE_ID_I           965 ADVANCED HIGH

Not only does it not give us an error, but it has actually managed to successfully compress such a unique index containing nothing but a bunch of unique numbers to just 965 leaf blocks, down from 2088. The index is now less than half its previous size.

So any Oracle B-tree index, even if it’s a single column unique index, is a possible candidate to be compressed with “High” advanced index compression.

More to come.

12.2 Index Advanced Compression “High” – Part I (High Hopes) December 6, 2016

Posted by Richard Foote in 12c Rel 2, 12c Release 2 New Features, Advanced Index Compression, Oracle Indexes.
7 comments

Oracle first introduced Advanced Compression for Indexes in 12.1 as I’ve discussed here a number of times.

With Oracle Database 12c Release 2, you can now use Index Advanced Compression “High” to further (and potentially dramatically) improve the index compression ratio. ¬†Instead of simply de-duplicating the index entries within an index leaf block, High Index Compression uses more complex compression algorithms and stores the index entries in a Compression Unit (similar to that as used with Hybrid Columnar Compression). The net result is generally a much better level of compression, but at the potential cost of more CPU resources to both access and maintain the index structures.

To give you an idea on the possible compression improvements, let’s re-run the demo I used previously when I first discussed Advanced Index Compression.

So I first create a table, where the CODE column that has many distinct values, but a portion (25%) of data that is replicated:

SQL> create table bowie (id number, code number, name varchar2(30));

Table created.

SQL> insert into bowie select rownum, rownum, 'ZIGGY STARDUST' from dual connect by level <= 1000000;

1000000 rows created.

SQL> update bowie set code = 42 where id between 250000 and 499999;

250000 rows updated.

SQL> commit;

Commit complete.

I then create an index on the CODE column and check out its initial size:

SQL> create index bowie_code_i on bowie(code);

Index created.

SQL> select index_name, leaf_blocks, compression from user_indexes where table_name='BOWIE';

INDEX_NAME   LEAF_BLOCKS COMPRESSION
------------ ----------- -------------
BOWIE_CODE_I        2158 DISABLED

 

If I just use normal compression on this index:

SQL> alter index bowie_code_i rebuild compress;

Index altered.

SQL> select index_name, leaf_blocks, compression from user_indexes where table_name='BOWIE';

INDEX_NAME   LEAF_BLOCKS COMPRESSION
------------ ----------- -------------
BOWIE_CODE_I        2684 ENABLED

 

We notice the index actually increases in size (2684 up from 2158), as most (75%) of the CODE values are unique and so the overheads associated with the resultant prefix table in the leaf blocks used with normal index compression overrides the savings of compression on the 25% of the index where compression is beneficial.

If we use “Low” advanced index compression as introduced in 12.1:

SQL> alter index bowie_code_i rebuild compress advanced low;

Index altered.

SQL> select index_name, leaf_blocks, compression from user_indexes where table_name='BOWIE';

INDEX_NAME   LEAF_BLOCKS COMPRESSION
------------ ----------- -------------
BOWIE_CODE_I        2057 ADVANCED LOW

 

We notice the index has now indeed decreased in size (2057 down from 2158), as Oracle has automatically compressed just the 25% of the index where compression was beneficial and not touched the 75% of the index where compression wasn’t possible when de-duplicating values.

If we now however use the new 12.2 Advanced Index Compression “High” option:

SQL> alter index bowie_code_i rebuild compress advanced high;

Index altered.

SQL> select index_name, leaf_blocks, compression from user_indexes where table_name='BOWIE';

INDEX_NAME   LEAF_BLOCKS COMPRESSION
------------ ----------- -------------
BOWIE_CODE_I           0 ADVANCED HIGH

Wow, an index with now no leaf blocks, that’s unbelievably small. Actually, I don’t believe it as¬†this is due to bug 22094934. We need to gather index statistics to see the new index size:

 

SQL> exec dbms_stats.gather_index_stats(ownname=>null, indname=>'BOWIE_CODE_I');

PL/SQL procedure successfully completed.

SQL> select index_name, leaf_blocks, compression from user_indexes where table_name='BOWIE';

INDEX_NAME   LEAF_BLOCKS COMPRESSION
------------ ----------- -------------
BOWIE_CODE_I         815 ADVANCED HIGH

 

We notice that the index hasn’t just gone now a tad in size, but is now substantially smaller than before (down to just 815 leaf blocks, rather than the smaller 2057 from 2158 reduction we previously achieved with low index advanced compression.

So Index Advanced Compression, with the now default “HIGH” option can substantially reduce index sizes. Note this new capability of course requires the Advanced Compression Option.

More to come.

Oracle Database 12c Release 2: New Indexing Features – Long Identifiers (Several Species of Small Furry Animals Gathered Together in a Cave and Grooving With a Pict) November 24, 2016

Posted by Richard Foote in 12c Rel 2, 12c Release 2 New Features, Long Identifiers, Oracle Indexes.
1 comment so far

Oracle Database 12c Release 2 has now been officially released (at least on the Oracle Database Cloud and Oracle Database Exadata Express Cloud Services).

In the coming weeks, I’ll be blogging about quite a number of new indexing features/capabilities/improvements that have been introduced in Oracle Database 12c Release 2. These include:

  • Advanced Index Compression
  • Tracking Index Usage
  • Online Table Move (and automatic Index maintenance)
  • Online Conversion to Partitioned Table (and Partitioned Indexes)
  • Deferred Invalidation of Cursors During Index Creation/Rebuild
  • Indexing JSON Improvements
  • Text Index Improvements

To start things off, how many times have you wanted to name an index something such as follows:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,10), 'DAVID BOWIE' from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index this_index_will_be_used_to_get_data_from_the_bowie_table_in_sc
enarios_when_the_code_column_is_used_in_predicates_idx on bowie(code);
create index this_index_will_be_used_to_get_data_from_the_bowie_table_in_scenari
os_when_the_code_column_is_used_in_predicates_idx on bowie(code)
*
ERROR at line 1:
ORA-00972: identifier is too long

Only to get an error that you’ve exceeded the 30 character identifier.

Well, with 12.2, the maximum length of an identifier has been increased to 128 characters, so now creating such an index works:


SQL> create index this_index_will_be_used_to_get_data_from_the_bowie_table_in_scenarios_when_the_code_column_is_used_in_predicates_idx on 
bowie(code);

Index created.

SQL> select index_name, leaf_blocks, status from dba_indexes where table_name='BOWIE';

INDEX_NAME                                         LEAF_BLOCKS STATUS
-------------------------------------------------- ----------- --------
THIS_INDEX_WILL_BE_USED_TO_GET_DATA_FROM_THE_BOWIE        1939 VALID
_TABLE_IN_SCENARIOS_WHEN_THE_CODE_COLUMN_IS_USED_I
N_PREDICATES_IDX

 

The possibilities are endless:

SQL> alter table bowie add constraint the_primary_key_of_the_bowie_table_is_the_
id_column_so_please_stop_trying_to_insert_a_duplicate_id_value_dumbo primary key
(id);

Table altered.

SQL> insert into bowie values (42, 42, 'David Bowie');
insert into bowie values (42, 42, 'David Bowie')
*
ERROR at line 1:
ORA-00001: unique constraint
(BOWIE.THE_PRIMARY_KEY_OF_THE_BOWIE_TABLE_IS_THE_ID_COLUMN_SO_PLEASE_STOP_TRYING _TO_INSERT_A_DUPLICATE_ID_VALUE_DUMBO) violated

Developers can now be, shall we say, more “expressive” with the database objects they create …

Oracle 12c: Indexing JSON in the Database Part III (Paperback Writer) September 2, 2016

Posted by Richard Foote in 12c, JSON, JSON Text Index, Oracle Indexes.
2 comments

In Part I and Part II, we looked at how to index specific attributes within a JSON document store within an Oracle 12c database.

But what if we’re not sure which specific attributes¬†might benefit from an index or indeed, as JSON is by it’s nature a schema-less way to store data, what if we’re not entirely sure what attributes¬†might be present currently¬†or in the future.

On a JSON document store within the Oracle Database, you can create a special JSON aware Text Index that can automatically index any field/attribute within a JSON document and use a Text based function to then search efficiently for data from any attribute.

Using the same table created in Part I, you can create a JSON Text index as follows:

SQL> CREATE INDEX ziggy_search_idx ON ziggy_json (ziggy_order)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS ('section group CTXSYS.JSON_SECTION_GROUP SYNC (ON COMMIT)');
Index created.

Note this Text index is (optionally) defined to be automatically synchronised when data in the ZIGGY_JSON table is committed.

We can use the JSON_TEXTCONTAINS Oracle Text function to efficiently access data for any data within the JSON defined column. For example:

SQL> SELECT * FROM ziggy_json WHERE json_textcontains(ziggy_order, '$.Reference', 'DBOWIE-201642');

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 3069169778

------------------------------------------------------------------------------------------------
| Id | Operation                   | Name             | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                  |    1 |  1534 |    2550 (0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY_JSON       |    1 |  1534 |    2550 (0) |  00:00:01 |
|* 2 | DOMAIN INDEX                | ZIGGY_SEARCH_IDX |      |       |    2549 (0) |  00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CTXSYS"."CONTAINS"("ZIGGY_JSON"."ZIGGY_ORDER",'{DBOWIE-201642}
 INPATH(/Reference)')>0)

Statistics
----------------------------------------------------------
  65 recursive calls
   0 db block gets
 118 consistent gets
   0 physical reads
   0 redo size
1863 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

Note the JSON-based Text index is used to retrieve data efficiently.

The Text index can also be used to search data efficiently from within an array set:

SQL> SELECT * FROM ziggy_json WHERE json_textcontains(ziggy_order, '$.LineItems.Part.Description', 'Low');

no rows selected

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 3069169778

------------------------------------------------------------------------------------------------
| Id | Operation                   | Name             | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                  |    1 |  1534 |    5927 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY_JSON       |    1 |  1534 |    5927 (0) | 00:00:01 |
|* 2 | DOMAIN INDEX                | ZIGGY_SEARCH_IDX |      |       |    5927 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CTXSYS"."CONTAINS"("ZIGGY_JSON"."ZIGGY_ORDER",'{Low}
 INPATH(/LineItems/Part/Description)')>0)

Statistics
----------------------------------------------------------
132 recursive calls
  0 db block gets
182 consistent gets
  0 physical reads
  0 redo size
627 bytes sent via SQL*Net to client
552 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  0 rows processed

We can also search for a specific data value across any attribute within the JSON document store:

SQL> SELECT * FROM ziggy_json WHERE json_textcontains(ziggy_order, '$', '4242');

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 3069169778

------------------------------------------------------------------------------------------------
| Id | Operation                   | Name             | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                  |    1 |  1534 |       2 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY_JSON       |    1 |  1534 |       2 (0) | 00:00:01 |
|* 2 | DOMAIN INDEX                | ZIGGY_SEARCH_IDX |      |       |       1 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 2 - access("CTXSYS"."CONTAINS"("ZIGGY_JSON"."ZIGGY_ORDER",'{4242}')>0)
     
Statistics
----------------------------------------------------------
  10 recursive calls
   0 db block gets
  32 consistent gets
   0 physical reads
   0 redo size
1865 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

Let’s now add more data to the JSON column, but this time introducing a few new attributes (such as AlbumName):

SQL> insert into ziggy_json
 2 select
 3 rownum,
 4 SYSdate,
 5 '{"AlbumId" : ' || rownum || ',
 6 "AlbumName" : "HUNKY DORY",
 7 "ArtistName" : "David Bowie"}'
 8 from dual connect by level <= 10;
10 rows created.

SQL> commit;

Commit complete.

As the JSON-based Text index was defined to be automatically synchronised when we commit data in the table, these new attributes can be immediately searched and accessed via the index:

SQL> SELECT * FROM ziggy_json WHERE json_textcontains(ziggy_order, '$.AlbumName', 'HUNKY DORY');

10 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
Plan hash value: 3069169778

------------------------------------------------------------------------------------------------
| Id | Operation                   | Name             | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                  |  198 |  296K |    1948 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY_JSON       |  198 |  296K |    1948 (0) | 00:00:01 |
|* 2 | DOMAIN INDEX                | ZIGGY_SEARCH_IDX |      |       |    1780 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CTXSYS"."CONTAINS"("ZIGGY_JSON"."ZIGGY_ORDER",'{HUNKY DORY}
 INPATH(/AlbumName)')>0)
     
Statistics
----------------------------------------------------------
  48 recursive calls
   0 db block gets
 103 consistent gets
   0 physical reads
   0 redo size
6751 bytes sent via SQL*Net to client
3232 bytes received via SQL*Net from client
  22 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
  10 rows processed

Not only can JSON data be stored within the Oracle 12c database, but we have a number of index strategies available to search such data efficiently.

Oracle 12c: Indexing JSON in the Database Part II (Find A Little Wood) August 5, 2016

Posted by Richard Foote in 12c, JSON, JSON Text Index, Oracle Indexes.
4 comments

In Part I, we looked at how you can now store JSON documents within the Oracle 12c Database. For efficient accesses to JSON documents stored in the Oracle database, we can either create a function-based index based on the JSON_VALUE function or on JSON .dot notation.

These indexes are useful for indexing specific JSON attributes, but what if we want to index multiple JSON attributes within a single index structure.

To start, I’m just going to add an extra row to increase the selectivity of other columns.

SQL> insert into ziggy_json
2 select
3 100001,
4 SYSdate,
5 '{"PONumber" : 1000001,
6 "Reference" : "MTOM-20161",
7 "Requestor" : "Major Tom",
8 "User" : "MTOM",
9 "CostCenter" : "B42",
10 "ShippingInstructions" : {"name" : "Major Tom",
11 "Address": {"street" : "42 Ziggy Street",
12 "city" : "Canberra",
13 "state" : "ACT",
14 "zipCode" : 2601,
15 "country" : "Australia"},
16 "Phone" : [{"type" : "Office", "number" : "417-555-7777"},
17 {"type" : "Mobile", "number" : "417-555-1234"}]},
18 "Special Instructions" : null,
19 "AllowPartialShipment" : true,
20 "LineItems" : [{"ItemNumber" : 1,
21 "Part" : {"Description" : "Hunky Dory",
22 "UnitPrice" : 10.95},
23 "Quantity" : 5.0},
24 {"ItemNumber" : 2,
25 "Part" : {"Description" : "Pin-Ups",
26 "UnitPrice" : 10.95},
27 "Quantity" : 3.0}]}'
28 from dual;

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

We can still create composite indexes based on the JSON_VALUE function as we can with conventional columns:

SQL> create index ziggy_json_idx3 on 
     ziggy_json(json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20)), 
                json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6)));

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

If we now run a query with a couple of JSON_VALUE based predicates:

SQL> select * from ziggy_json
     where json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20))='MTOM' and 
           json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6)) = 'B42';

Execution Plan
----------------------------------------------------------
Plan hash value: 3402615542

-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes | Cost (%CPU) | Time     |
-------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                 |    1 |  1533 |       4 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON      |    1 |  1533 |       4 (0) | 00:00:01 |
|*2  | INDEX RANGE SCAN                    | ZIGGY_JSON_IDX3 |    1 |       |       3 (0) | 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.User' RETURNING VARCHAR2(20) NULL ON
ERROR)='MTOM' AND JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.CostCenter' RETURNING VARCHAR2(6) NULL ON ERROR)='B42')
 
Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   5 consistent gets
   0 physical reads
   0 redo size
1248 bytes sent via SQL*Net to client
 820 bytes received via SQL*Net from client
   4 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

The composite index is effectively used by the CBO as expected.

It does though make our SQL a little cumbersome to write. To simplify things a tad, we could create a couple of virtual columns based on these functions, create the JSON function-based indexes on these virtual columns and simplify the SQL accordingly.

First, we create the virtual columns (note they’re virtual columns and so consume no storage):

 

SQL> ALTER TABLE ziggy_json ADD (userid VARCHAR2(20)
2 GENERATED ALWAYS AS (json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20))));

Table altered.

SQL> ALTER TABLE ziggy_json ADD (costcenter VARCHAR2(6)
2 GENERATED ALWAYS AS (json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6))));

Table altered.

 

Next, create the index based on these newly created virtual columns:

SQL> CREATE INDEX ziggy_user_costctr_idx on ziggy_json(userid, costcenter);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

And then write a simplified version of the SQL to reference the virtual columns:

SQL> select * from ziggy_json where userid='MTOM' and costcenter='B42';

ID ZIGGY_DAT
---------- ---------
ZIGGY_ORDER
--------------------------------------------------------------------------------

USERID COSTCE
-------------------- ------
100001 24-JUN-16
{"PONumber" : 1000001,
"Reference" : "MTOM-20161",
MTOM B42
  
Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 5717455

--------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                   | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                        |    1 |  1535 |       4 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON             |    1 |  1535 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_USER_COSTCTR_IDX |    1 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("USERID"='MTOM' AND "COSTCENTER"='B42')
   
Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   5 consistent gets
   0 physical reads
   0 redo size
1396 bytes sent via SQL*Net to client
 820 bytes received via SQL*Net from client
   4 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

The index is again used as expected.

Of course, if we still want to reference the JSON functions directly within the SQL, the query can still be written as previously:

SQL> select * from ziggy_json 
     where json_value(ziggy_order, '$.User' returning varchar2(20))='MTOM' and
           json_value(ziggy_order, '$.CostCenter' returning varchar2(6))='B42';

ID ZIGGY_DAT
---------- ---------
ZIGGY_ORDER
--------------------------------------------------------------------------------

USERID COSTCE
-------------------- ------
100001 24-JUN-16
{"PONumber" : 1000001,
"Reference" : "MTOM-20161",
MTOM B42
   
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 5717455

--------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                   | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                        |    1 |  1535 |       4 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON             |    1 |  1535 |       4 (0) | 00:00:01 |
|*2  | INDEX RANGE SCAN                    | ZIGGY_USER_COSTCTR_IDX |    1 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ZIGGY_JSON"."USERID"='MTOM' AND "ZIGGY_JSON"."COSTCENTER"='B42')

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   5 consistent gets
   0 physical reads
   0 redo size
1396 bytes sent via SQL*Net to client
 820 bytes received via SQL*Net from client
   4 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

The index is again used as expected.

I’ll next look at using a JSON Text based index to effectively index the entire JSON document.

Oracle 12c: Indexing JSON In The Database Part I (Lazarus) August 2, 2016

Posted by Richard Foote in 12c, JSON, Oracle Indexes.
add a comment

One of the very cool new features introduced in Oracle Database 12c Rel 1 is the ability to store JavaScript Object Notation (JSON) documents within the database. Unlike XML which has its own data type, JSON data can be stored as VARCHAR2, CLOB or BLOB data types, but with a JSON check constraint to ensure the stored parsed document meets JSON document standards.

This enables ‚ÄėNo-SQL’ schema-less type development within the¬†Oracle Database¬† for¬†‚Äúnext generation‚ÄĚ applications. Although developers¬†can work with the JSON store without using SQL (via say RESTful APIs directly within the database), you can still¬†leverage the power of SQL for reporting and analytics type purposes. Of course, reading JSON data efficiently then becomes important, and that’s where indexing the JSON document store kicks in.

Let’s look at a simple example.

Firstly, let’s¬†create a table with a column called ZIGGY_ORDER which stores JSON documents.

SQL> CREATE TABLE ziggy_json  (id number,  ziggy_date date,  ziggy_order CLOB  CONSTRAINT ensure_ziggy_json CHECK (ziggy_order IS JSON));</pre>
Table created.

The ZIGGY_ORDER column has a JSON check constraint which ensures only valid JSON documents can be stored. An attempt to insert a row with invalid JSON data will fail:

SQL> insert into ziggy_json values (1, sysdate, '{"This is not legal JSON"}');
insert into ziggy_json values (1, sysdate, '{"This is not legal JSON"}')
*
ERROR at line 1:
ORA-02290: check constraint (BOWIE.ENSURE_ZIGGY_JSON) violated

Let’s insert some JSON data into the table. Note I’m using ROWNUM to insert some almost unique JSON data and then re-inserting the data again to get data worth accessing via an index:

SQL> insert into ziggy_json
2 select
3 rownum,
4 SYSdate,
5 '{"PONumber" : ' || rownum || ',
6 "Reference" : "DBOWIE-2016' || rownum || '",
7 "Requestor" : "David Bowie",
8 "User" : "DBOWIE",
9 "CostCenter" : "A42",
10 "ShippingInstructions" : {"name" : "David Bowie",
11 "Address": {"street" : "42 Ziggy Street",
12 "city" : "Canberra",
13 "state" : "ACT",
14 "zipCode" : 2601,
15 "country" : "Australia"},
16 "Phone" : [{"type" : "Office", "number" : "417-555-7777"},
17 {"type" : "Mobile", "number" : "417-555-1234"}]},
18 "Special Instructions" : null,
19 "AllowPartialShipment" : true,
20 "LineItems" : [{"ItemNumber" : 1,
21 "Part" : {"Description" : "Hunky Dory",
22 "UnitPrice" : 10.95},
23 "Quantity" : 5.0},
24 {"ItemNumber" : 2,
25 "Part" : {"Description" : "Pin-Ups",
26 "UnitPrice" : 10.95},
27 "Quantity" : 3.0}]}'
28 from dual connect by level <= 1000000;

1000000 rows created.

SQL> insert into ziggy_json select * from ziggy_json;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

There are a number of ways we can reference and access data from within JSON. One method is use .dot notation to access specific JSON data elements of interest:

SQL> SELECT z.ziggy_order.PONumber FROM ziggy_json z where z.ziggy_order.PONumber=42;

PONUMBER
--------------------------------------------------------------------------------

42
42

We can also use the JSON_VALUE function to access specific JSON data of interest:

SQL> select json_value(ziggy_order, '$.Reference') from ziggy_json  
     where json_value(ziggy_order, '$.PONumber' returning number)=42;

JSON_VALUE(ZIGGY_ORDER,'$.REFERENCE')
--------------------------------------------------------------------------------

DBOWIE-201642
DBOWIE-201642

Without an index, the CBO has no choice but to use an expensive Full Table Scan:

SQL> SELECT z.ziggy_order.PONumber FROM ziggy_json z where z.ziggy_order.PONumber=42;

Elapsed: 00:00:34.04

Execution Plan
----------------------------------------------------------
Plan hash value: 1413303849

--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows  | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |            | 20000 |   28M |    129K (3) | 00:00:06 |
|* 1 | TABLE ACCESS FULL | ZIGGY_JSON | 20000 |   28M |    129K (3) | 00:00:06 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(TO_NUMBER(JSON_QUERY("Z"."ZIGGY_ORDER" FORMAT JSON ,
'$.PONumber' RETURNING VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL
ON ERROR))=42)

Statistics
----------------------------------------------------------
     0 recursive calls
     0 db block gets
500057 consistent gets
299745 physical reads
     0 redo size
   596 bytes sent via SQL*Net to client
   552 bytes received via SQL*Net from client
     2 SQL*Net roundtrips to/from client
     0 sorts (memory)
     0 sorts (disk)
     2 rows processed

 

At some¬†500,057 consistent gets and an elapsed time of 34.04 seconds, the above query is “slow” and expensive.

So one option to speed things up is to create a function-based index using the JSON_VALUE function. This can then be used to quickly access data that matches specific JSON name values of interest:

SQL> CREATE INDEX ziggy_po_num_idx ON 
     ziggy_json (json_value(ziggy_order, '$.PONumber' RETURNING NUMBER ERROR ON ERROR));

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

As it’s a function-based index, collecting statistics on the implicitly created virtual column is advisable. If we now re-run the query:

SQL> select * from ziggy_json 
     where json_value(ziggy_order, '$.PONumber' returning number)=42;

ID ZIGGY_DAT
---------- ---------
ZIGGY_ORDER
------------------------------------------------------------
42 24-JUN-16
{"PONumber" : 42,
"Reference" : "DBOWIE-201642",
42 24-JUN-16
{"PONumber" : 42,
"Reference" : "DBOWIE-201642",

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1939019025

--------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name             | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                  |    2 |  3058 |       5 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON       |    2 |  3058 |       5 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX |    2 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING NUMBER ERROR ON
ERROR)=42)
Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   9 consistent gets
   0 physical reads
   0 redo size
1863 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

The query now uses the JSON function-based index, performs just 9 consistent gets and completes in 0.01 second.

The index can be used for any data accesses in which the CBO considers the index the cheaper alternative:

SQL> select * from ziggy_json where json_value(ziggy_order, '$.PONumber' returning number)<42;

82 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1939019025

--------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name             | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                  |   82 |  122K |      86 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON       |   82 |  122K |      86 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX |   82 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING NUMBER ERROR ON
ERROR)<42)

Statistics
----------------------------------------------------------
    0 recursive calls
    0 db block gets
  249 consistent gets
    0 physical reads
    0 redo size
50623 bytes sent via SQL*Net to client
22528 bytes received via SQL*Net from client
  166 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
   82 rows processed

Indexes can also be created based on the .dot JSON notation:

SQL> CREATE INDEX ziggy_po_num_idx2 ON ziggy_json z (z.ziggy_order.PONumber);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

A query based on the associated JSON .dot notation can now run efficiently via the index:

SQL> SELECT * FROM ziggy_json z where z.ziggy_order.PONumber='42';

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 4224387816

---------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name              | Rows | Bytes| Cost (%CPU) | Time     |
---------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                   |    2 | 3062 |       6 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON        |    2 | 3062 |       6 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX2 |    2 |      |       3 (0) | 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_QUERY("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING VARCHAR2(4000) ASIS
WITHOUT ARRAY WRAPPER NULL ON ERROR)='42')

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   9 consistent gets
   0 physical reads
   0 redo size
1863 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

Query again uses the index and is just as efficient with almost immediate response times at just 9 consistent gets.

JSON indexes can also be used to police and ensure data constraints and integrity (which can be problematic with JSON documents). The following numeric index example also implicitly adds data constraint capabilities:

SQL> CREATE INDEX ziggy_po_num_idx3 ON ziggy_json z (to_number(z.ziggy_order.PONumber));

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

SQL> insert into ziggy_json
2 select
3 rownum,
4 SYSdate,
5 '{"PONumber" : "200000A",
6 "Reference" : "DBOWIE-2016' || rownum || '",
7 "Requestor" : "David Bowie",
8 "User" : "DBOWIE",
9 "CostCenter" : "A42",
10 "ShippingInstructions" : {"name" : "David Bowie",
11 "Address": {"street" : "42 Ziggy Street",
12 "city" : "Canberra",
13 "state" : "ACT",
14 "zipCode" : 2601,
15 "country" : "Australia"},
16 "Phone" : [{"type" : "Office", "number" :"417-555-7777"},
17 {"type" : "Mobile", "number" :"417-555-1234"}]},
18 "Special Instructions" : null,
19 "AllowPartialShipment" : true,
20 "LineItems" : [{"ItemNumber" : 1,
21 "Part" : {"Description" : "Hunky Dory",
22 "UnitPrice" : 10.95},
23 "Quantity" : 5.0},
24 {"ItemNumber" : 2,
25 "Part" : {"Description" : "Pin-Ups",
26 "UnitPrice" : 10.95},
27 "Quantity" : 3.0}]}'
28 from dual;
insert into ziggy_json
*
ERROR at line 1:
ORA-01722: invalid number

The PONumber value has to now be numeric for it to be successfully added to the JSON document store. The index of course can also be used for efficient data access:

SQL> SELECT * FROM ziggy_json z where to_number(z.ziggy_order.PONumber)=42;

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 692052820

---------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name              | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                   |    2 |  3048 |      5 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON        |    2 |  3048 |      5 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX3 |    2 |       |      3 (0) | 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(TO_NUMBER(JSON_QUERY("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING
VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL ON ERROR))=42)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   9 consistent gets
   0 physical reads
   0 redo size
1863 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

In Part II, I’ll look at how to create composite JSON indexes and how to use a text index to automatically index all name fields within a JSON document.

Next Round Of ANZ “Let’s Talk Database” Events (July/August 2016) June 20, 2016

Posted by Richard Foote in Let's Talk Database, Oracle Indexes.
11 comments

I‚Äôll be presenting the next¬†round of ‚ÄúLet‚Äôs Talk Database‚ÄĚ events around Australia and NZ this winter¬†in July/August 2016.¬†These are free events but due to limited places have often ‚Äúsold out‚ÄĚ in the past, so booking early is recommended to avoid disappointment.

All events run between¬†9:00am ‚Äst12:30pm and are¬†followed by a networking lunch.

We always have a fun time at these events and feedback suggest everyone learns something, so I hope if you’re lucky enough to be in the¬†ANZ region, you can make it to your local event ūüôā

Currently, the confirmed events and topics are as follows:

  • Sydney ‚Äď Wednesday, 20¬†July 2016:¬†Let‚Äôs Talk Oracle Database:¬†Intro To CBO Costings, Database Futures. To register, email:¬†mitch.foster@oracle.com
  • Melbourne ‚Äď Thursday, 21¬†July¬†2016:¬†Let‚Äôs Talk Oracle Database: Performance Diagnostics ‚Äď How To Correctly Use AWR Reports. To register, email: johanne.sergeant@oracle.com SOLD OUT !!
  • Canberra ‚Äď Thursday, 28¬†July 2016: Let‚Äôs Talk Database: Oracle Database 12c ‚Äď Built for Data Warehousing. To register, email: richard.foote@oracle.com
  • Wellington – ¬†Wednesday, 10¬†August 2016:¬†Let‚Äôs Talk Oracle Database: Performance Diagnostics ‚Äď How To Correctly Use AWR Reports.¬†To register, email: tanya.fea@oracle.com
  • Auckland¬†–¬†Thursday, 11 August 2016: Let‚Äôs Talk Oracle Database: Performance Diagnostics ‚Äď How To Correctly Use AWR Reports.¬†To register, email: tanya.fea@oracle.com
  • Perth ‚Äď Wednesday, 17¬†August 2016: Let‚Äôs Talk Database: Intro To CBO Costings, Database Futures. To register, email: dennis.ward@oracle.com
  • Adelaide ‚Äď Thursday, 18¬†August 2016:¬†Let‚Äôs Talk Database: Oracle Database 12c ‚Äď Built for Data Warehousing. To register, email: brodie.james@oracle.com
  • Brisbane: Wednesday, 24¬†August 2016:¬†Let‚Äôs Talk Oracle Database: Intro To CBO Costings, Database Futures. To register, email: gordon.laing@oracle.com

 

Details of sessions.

Performance Diagnostics ‚Äď How To Correctly Use AWR Reports

I often get called in by customers to determine and address the root cause of database performance issues. Depending on the issue, a request for a simple Automatic Workload Repository (AWR) report is often sufficient to accurately diagnose the root problem(s). However, many DBAs find AWR reports daunting as they contain so much database diagnostic information, they don’t even know where to begin to find the truly useful information.

In this session, I begin by discussing the basic concepts of response times, DB times, the Oracle Wait Interface and how to focus on what is truly important. I then look at how to best read an AWR report to quickly go to the most relevant sections that detail any specific issues. I also discuss a general tuning and diagnostic methodology that ensures one can quickly determine whether an AWR report will indeed be sufficient and how to accurately and consistently use the AWR report to pinpoint and determine root causes for global database performance issues. We‚Äôll go through a number of actual ‚Äúreal-life‚ÄĚ examples that highlight various performance issues and how one accurately determines the actual root issues through an appropriate AWR report.

Time permitting, we’ll go through some other useful diagnostic tools such as the AWR Warehouse, Automatic Database Diagnostic Monitor (ADDM), Active Session History (ASH)  and SQL Extended Tracing and scenarios when they can prove to be invaluable in addressing database performance issues.

 

Introduction Into The CBO (Why Is The CBO doing a Full Table Scan?)

A common issue is confusion on why the Oracle Cost Based Optimizer (CBO) has made the decision the use a full table scan or use that index and not the one that appears more appropriate. The costing figures that the CBO derives and displays in execution plans are often viewed as cryptic, which is unfortunate as these costings are key to CBO decisions. Often the attempt to resolve such issues is to set inappropriate database parameters that introduces as many issues as it tries to resolve. This session looks under the covers of the CBO to see how it precisely derives its various costings and hence how to correctly troubleshoot and resolve inefficient queries when they arise. Bring a calculator, it might come in handy ūüôā

Database Futures – Exadata Secret Sauce and Software on Silicon

Oracle has recently released the new generation of the Exadata machine, the X6-2. Not only is it bigger and faster but also has a host of features and capabilities that is unique to the Exadata storage platform including Smart Scans, Storage Indexes, Exafusion, Smart Fusion Block Transfer, Dual Format Columnar Flash Cache, I/O Latency Capping, Hybrid Columnar Compression, Sparse Snapshots and In-Memory Database Fault Tolerance. All these unique features and their enhancements with the new X6-2 will be discussed. Additionally, Oracle’s Software-in-Silicon features extend the concepts of co-engineering hardware and software inherent in engineered systems such as the Exadata. Embedding software which has traditionally sat in the database layer into the hardware layer can yield both performance and security beyond that available of a traditional CPU. This presentation also covers how Oracle Database 12c can leverage Software-in-Silicon features, where the Oracle Database is heading in the future and why you should care.

 

Oracle Database 12c ‚Äď Built for Data Warehousing

The Oracle Database is the leading database in market but it might come as a surprise to some that it’is also the leading database with respect to Data Warehousing in terms of both sales and analyst rankings. The focus today is a detailed look at all the special database capabilities that makes the Oracle Database the perfect platform for Data Warehouse type applications.

We explore the overall Data Warehousing Information Management architecture and how it relates to both data warehouses and data reservoirs. We then deep dive into standard database features such as Materialized Views, Dimensions, Bitmap Indexes, Query Re-Write, JSON in the database, Partitioning, Parallel Processing, Compression, Attribute Clustering and SQL for analysis, reporting, aggregation and pattern matching.

We then discuss more advanced Oracle database capabilities such as Data Mining, Predictive Analytics, R in the database, Database In-Memory, Big Data SQL, Spatial & Graph and the specific engineered systems capabilities that makes Oracle such an effective Data Warehouse platform, especially when combined with the enterprise capabilities already associated with the Oracle Database such as security, high availability, scalability and manageability.

Storing Date Values As Numbers (The Numbers) June 1, 2016

Posted by Richard Foote in 12c, CBO, Histograms, Oracle Indexes, Storing Dates As Numbers.
7 comments

In my last couple of posts, I’ve been discussing how storing date data in a character based column is a really really bad idea.

In a follow-up question, I was asked if storing dates in NUMBER format was a better option.¬†The answer is that it’s probably an improvement from storing dates as strings but it’s still a really really bad idea. Storing dates in DATE format is easily the best option as is storing any data in its native data type.

In this post, I’ll highlight a few¬†of the classic issues with storing dates in basic number format as well as showing you some of the calculations on the CBO cardinality estimates.

As usual, the demo starts with a basic little table that I’ll populate with date data stored in a NUMBER column (ZIGGY_DATE):

SQL> create table ziggy (id number, code number, ziggy_date number);
    
Table created.

SQL> insert into ziggy select rownum, mod(rownum,1000), 
to_number(to_char(sysdate-mod(rownum,10000), 'YYYYMMDD')) 
from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

We’ll now collect statistics on the table:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='ZIGGY';

COLUMN_NAME NUM_DISTINCT    DENSITY HISTOGRAM       HID VIR
----------- ------------ ---------- --------------- --- ---
ZIGGY_DATE         10000      .0001 NONE            NO  NO
CODE                1000       .001 NONE            NO  NO
ID               1000000    .000001 NONE            NO  NO

So the ZIGGY_DATE column has 10,000 distinct dates (with 100 rows per distinct date), with a column density of 1/10000 = 0.0001.

Let’s now create a standard B-Tree index on the ZIGGY_DATE column:

SQL> create index ziggy_date_i on ziggy(ziggy_date);
                  
Index created.

If we look a sample of the data in the column and the min/max date ranges:

SQL> select * from ziggy where rownum <11;
        
        ID       CODE ZIGGY_DATE
---------- ---------- ----------
       776        776   20140412
       777        777   20140411
       778        778   20140410
       779        779   20140409
       780        780   20140408
       781        781   20140407
       782        782   20140406
       783        783   20140405
       784        784   20140404
       785        785   20140403

SQL> select min(ziggy_date) min, max(ziggy_date) max from ziggy;

       MIN        MAX
---------- ----------
  19890110   20160527

We see that all the data in the ZIGGY_DATE column are just number representations of dates, with a range between 10 Jan 1989 and 27 May 2016.

Note there are actually 10,000 days between the dates but the CBO would¬†estimate a range of ¬†270,417 possible days (20160527 – 19890110 = 270,417). The CBO has no idea that the “numbers” within the column are all dates and that there are ranges of values in which data is¬†relatively popular (e.g. between say 20160101 and 20160131) and ranges of values in which data is relatively unpopular (e.g. say between 20154242 and 20159999).

Although not as bad as the range of possible unpopular values found within a character data type as I discussed previously when storing date data as a string, there is still enough data skew when storing dates as numbers to be problematic to the CBO.

If we select just one date with an equality predicate:

SQL> select * from ziggy where ziggy_date = 20150613;
                 
100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2700236208

----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name         | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |              |  100 |  1500 |     103 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY        |  100 |  1500 |     103 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_DATE_I |  100 |       |       3 (0) | 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ZIGGY_DATE"=20150613)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
110 consistent gets
0 physical reads
0 redo size
3883 bytes sent via SQL*Net to client
618 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed

The CBO gets things spot on, correctly estimating 100 rows to be returned, as the CBO knows there are only 10,000 distinct values of which only one of those values is being selected.

Selectivity is basically the density of the column = 1/10000 = 0.0001, so the estimated cardinality is 0.0001 x 1M rows = 100 rows. Perfect.

However, if we perform a range based query as follows:

SQL> select * from ziggy where ziggy_date between 20151010 and 20151111;
     
3300 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2700236208

----------------------------------------------------------------------------------------------------
| Id | Operation                           | Name         | Rows | Bytes | Cost (%CPU) | Time     |
----------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |              |  573 |  8595 |     580 (1) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY        |  573 |  8595 |     580 (1) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_DATE_I |  573 |       |       4 (0) | 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ZIGGY_DATE">=20151010 AND "ZIGGY_DATE"<=20151111)

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
3531 consistent gets
0 physical reads
0 redo size
108973 bytes sent via SQL*Net to client
2961 bytes received via SQL*Net from client
221 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3300 rows processed

The CBO has got things somewhat incorrect in this example and has underestimated the expect number of rows (573 rows vs. the 3,300 rows actually returned).

The actual number of days between these dates is 33 so the actual ratio of data returned is 33/10000 x 1M rows = 3,300 rows. This is a range of “numbers”¬†that overall covers a relatively “popular” range of ¬†date values.

However Oracle is estimating a range of some 20151111 Р20151010 = 101 days between these dates. As the total range of possible days 20160527-19890110 = 270,417, the estimated ratio of returned rows is 101/270417 plus 2 x selectivity of a day for the implicit 2 equality conditions (as a between is effectively >= and <=). The selectivity of one day is just the density of the column, 0.0001 as illustrated in the previous query.

Therefore, the query selectivity is derived as being (101/270417) + (2 x 0.0001) = 0.000573 when multiplied by 1M rows = 573 rows as estimated by the CBO.

So the CBO is rather significantly *under* estimating the rows to be returned which could result in a sub-optimal execution plan (such as the inappropriate use of an index range scan as in this example, noting the poor clustering of the data).

If we now look at another range scan below:

SQL> select * from ziggy where ziggy_date between 20151225 and 20160101;
    
800 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2421001569

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows  | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       | 33023 |  483K |    810 (15) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | ZIGGY | 33023 |  483K |    810 (15) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("ZIGGY_DATE">=20151225 AND "ZIGGY_DATE"<=20160101)

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
2824 consistent gets
0 physical reads
0 redo size
23850 bytes sent via SQL*Net to client
1135 bytes received via SQL*Net from client
55 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
800 rows processed

The actual number of days between these dates is only 8 so the actual ratio of data returned is 8/10000 x 1M rows = 800 rows. This is a range of “numbers”¬†that overall covers a relatively “unpopular” range of date values.

However Oracle is estimating a range of some 20160101 Р20151225 = 8876 days between these dates. As the total range of possible days is 20160527-19890110 = 270,417, the estimated ratio of returned rows is 8876/270417 plus 2 x the selectivity of a single day again for the 2 implicit equality conditions.

Therefore, the query selectivity is derived as being (8876/270417) + (2 x 0.0001) = 0.033023 when multiplied by 1M rows = 33,023 rows as estimated by the CBO.

So the CBO is rather significantly *over* estimating the rows to be returned which could again result in a sub-optimal execution plan (or the inappropriate use of a Full Table Scan in this example). The CBO is simply not picking up the fact that most of the possible values between the “number” ranges aren’t valid dates and can’t possibly exist.

Of course, having dates stored as simple numbers means Oracle has no way of ensuring data integrity and can allow “invalid” dates to be inserted:

SQL> insert into ziggy values (1000001, 42, 20160599);
            
1 row created.

SQL> rollback;

Rollback complete.

As with dates stored as strings, we can again address these issues by either collecting histograms for such columns and/or by creating a function-based date index on the column:

SQL> create index ziggy_date_fn_i on ziggy(to_date(ziggy_date,'YYYYMMDD'));

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY');

PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='ZIGGY';

COLUMN_NAME  NUM_DISTINCT    DENSITY HISTOGRAM       HID VIR
------------ ------------ ---------- --------------- --- ---
SYS_NC00004$        10000      .0001 NONE            YES YES
ZIGGY_DATE          10000      .0001 HYBRID          NO  NO
CODE                 1000       .001 NONE            NO  NO
ID                1000000    .000001 NONE            NO  NO

The associated query with the equality predicate has accurate estimates as it did previously:

SQL> select * from ziggy where to_date(ziggy_date, 'YYYYMMDD') = '13-JUN-2015';

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 945728471

-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes  | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                 |  100 |   2300 |     103 (0)| 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY           |  100 |   2300 |     103 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_DATE_FN_I |  100 |        |       3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')=TO_DATE(' 2015-06-13 00:00:00',
'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
111 consistent gets
0 physical reads
0 redo size
2877 bytes sent via SQL*Net to client
618 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed

As the virtual column created for the function-based index also has 10,000 distinct values and a corresponding density of 0.0001, the CBO is getting the cardinality estimate of 100 rows spot on.

But importantly, both associated range based¬†queries are now also being accurately costed by the CBO as it now knows the data being searched is date based and hence can more accurately determine the actual expected dates to be returned within the specified “date” ranges.

SQL> select * from ziggy where to_date(ziggy_date, 'YYYYMMDD') between '10-OCT-2015' and '11-NOV-2015';

3300 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2421001569

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       | 3400 | 78200 |   1061 (35) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | ZIGGY | 3400 | 78200 |   1061 (35) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')>=TO_DATE('
2015-10-10 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')<=TO_DATE(' 2015-11-11
00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
2991 consistent gets
0 physical reads
0 redo size
95829 bytes sent via SQL*Net to client
2961 bytes received via SQL*Net from client
221 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
3300 rows processed

The CBO is now estimating not 573 rows, but 3,400 rows which is much closer to the actual 3,300 rows being returned. As a result, the CBO is now performing a more efficient Full Table Scan (due to the poor Clustering Factor of the index) than the Index Range Scan performed previously.

If we look at the other range scan query:

SQL> select * from ziggy where to_date(ziggy_date, 'YYYYMMDD') between '25-DEC-2015' and '01-JAN-2016';

800 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 945728471

-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes | Cost (%CPU) | Time     |
-------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                 |  900 | 20700 |      909 (1)| 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY           |  900 | 20700 |      909 (1)| 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_DATE_FN_I |  900 |       |        5 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')>=TO_DATE(' 2015-12-25 00:00:00',
'syyyy-mm-dd hh24:mi:ss') AND TO_DATE(TO_CHAR("ZIGGY_DATE"),'YYYYMMDD')<=TO_DATE(' 2016-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
861 consistent gets
7 physical reads
0 redo size
18917 bytes sent via SQL*Net to client
1135 bytes received via SQL*Net from client
55 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
800 rows processed

The CBO is now estimating not 33023 rows, but 900 rows which is again much closer to the actual 800 rows being returned. As a result, the CBO is now performing a more efficient Index Range Scan than the Full Table Scan is was previously.

And of course, the database via the function-based date index now has a manner in which protect the integrity of the date data:

SQL> insert into ziggy values (1000001, 42, 20160599);
insert into ziggy values (1000001, 42, 20160599)
*
ERROR at line 1:
ORA-01847: day of month must be between 1 and last day of month

However, the best way in Oracle to store “Date” data is within a Date data type column …

Storing Date Values As Characters Part II (A Better Future) May 30, 2016

Posted by Richard Foote in 12c, CBO, Function Based Indexes, Oracle Indexes, Storing Dates as Characters.
2 comments

In the previous post, I discussed how storing date values within a character data type is a really really bad idea and illustrated how the CBO can easily get its costings totally wrong as a result. A function-based date index helped the CBO get the correct costings and protect the integrity of the date data.

During the demo, I re-collected statistics on the table as the associated hidden virtual column after¬†creating the function-based index doesn’t have statistics.

Before re-collecting statistics:

SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='BOWIE';</pre>
 
COLUMN_NAME  NUM_DISTINCT DENSITY     HISTOGRAM      HID VIR
------------ ------------ ---------- --------------- --- ---
SYS_NC00004$                         NONE            YES YES
BOWIE_DATE          10000      .0001 NONE            NO  NO
CODE                 1000       .001 NONE            NO  NO
ID                1000000    .000001 NONE            NO  NO

And afterwards:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed. 

SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column 
from dba_tab_cols where table_name='BOWIE'; 

COLUMN_NAME  NUM_DISTINCT DENSITY    HISTOGRAM       HID VIR 
------------ ------------ ---------- --------------- --- --- 
SYS_NC00004$        10000      .0001 NONE            YES YES 
BOWIE_DATE          10000      .0001 HYBRID          NO  NO 
CODE                 1000       .001 NONE            NO  NO 
ID                1000000    .000001 NONE            NO  NO

 

We can see that the hidden virtual column now has statistics.

But we also notice another difference, that being the BOWIE_DATE column now has a histogram (of type Hybrid).

As discussed in the previous post, the issue here is that the date data within the character column covers only a very specific subset of all the potential character values that could reside within the column. Therefore the CBO is getting the range scan selectivity hopelessly incorrect.

Now that we’ve run a few queries featuring¬†the BOWIE_DATE column in the predicates and as there’s effectively data skew within the column, the column becomes a candidate for a histogram with the default SIZE AUTO collection method.

The histogram now provides the CBO with a much more accurate picture of the distribution of the data within the BOWIE_DATE and that between discrete “date” column values, there are only so many rows that qualify.

As a result of the histogram, the CBO can now make much more accurate cardinality estimates.

If we now re-run the query that actually returns 8300 rows but the CBO previously estimated only 100 rows be returned:

SQL> select * from bowie where bowie_date between '2015 10 10' and '2015 12 31'
                                             
8300 rows selected.
                                                            
Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       | 4152 | 83040 |   1000 (12) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE | 4152 | 83040 |   1000 (12) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("BOWIE_DATE">='2015 10 10' AND "BOWIE_DATE"<='2015 12 31')
Statistics
----------------------------------------------------------
14 recursive calls
0 db block gets
4063 consistent gets
0 physical reads
0 redo size
282075 bytes sent via SQL*Net to client
6635 bytes received via SQL*Net from client
555 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
8300 rows processed

We see that at an estimated 4,152 rows, it’s a much better estimate. Not perfect, but maybe good enough to now get the more efficient¬†Full Table Scan execution plan.

If we re-run the query that returned over 1/2 the table at some 570,000 rows but with the CBO previously estimating only 116 rows:

SQL> select * from bowie where bowie_date between '2000 10 10' and '2016 12 31';</pre>
  
570800 rows selected.
    
Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       | 572K |   10M |   1012 (13) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE | 572K |   10M |   1012 (13) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("BOWIE_DATE">='2000 10 10' AND "BOWIE_DATE"<='2016 12 31')
   
Statistics
----------------------------------------------------------
14 recursive calls
0 db block gets
41456 consistent gets
4 physical reads
0 redo size
19292352 bytes sent via SQL*Net to client
419135 bytes received via SQL*Net from client
38055 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
570800 rows processed

 

We see that at an estimate of 572K rows, it’s now got this just about right and again has made the right decision with the Full Table Scan execution plan.

Storing date data in character based columns is still a really really bad idea and limits the manner in which date data can be analysed, protected and accessed, but with appropriate histograms in place, at least the CBO has some chance of making a reasonable fist of things with some range based queries.

As a follow-up, I was asked if storing dates in NUMBER format is a better option than as a string. I’ll discuss that next.

Storing Date Values As Characters (What’s Really Happening) May 26, 2016

Posted by Richard Foote in 12c, Function Based Indexes, Oracle Indexes, Storing Dates as Characters.
1 comment so far

For something that’s generally considered an extremely bad idea, I’ve lost count of the number of times I’ve come across applications that insist on storing date values as characters within the database. We’ve all seen them …

I recently got called in to assist a customer who was having issues with a POC in relation to the database not appearing to want to use the In-Memory Database option as expected. In various key scenarios, the CBO kept coming up with execution plans that used index plans (they were hesitant to drop these particular indexes), when if it only just used the Database In-Memory store,¬†the queries¬†ran so much faster. So I was called in to find out what’s really happening¬†and it turned out that the main culprit was indeed queries against columns where dates were stored as characters within the database. In the process, we¬†found another issue with some “invalid” date values. Go figure.

Interestingly, both issues could be addressed by creating a new index on the date column …

I’ve kinda replicated the scenario here with the following little test case. I’ll begin by creating a table with a varchar2 field (bowie_date) that actually stores a whole bunch of “dates”:

SQL> create table bowie (id number, code number, bowie_date varchar2(42));
Table created

SQL> insert into bowie select rownum, mod(rownum,1000), to_char(sysdate-mod(rownum,10000), 'YYYY MM DD') from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

So the bowie_date column basically has 10000 different dates, with each date evenly distributed with 100 occurrences for each date.

I’ll now collect statistics on the table:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');
PL/SQL procedure successfully completed.

SQL> select column_name, num_distinct, density, histogram, hidden_column, virtual_column from dba_tab_cols where table_name='BOWIE';

COLUMN_NAME  NUM_DISTINCT DENSITY          HISTOGRAM HID VIR
------------ ------------ ---------- --------------- --- ---
BOWIE_DATE          10000      .0001 NONE            NO  NO
CODE                 1000       .001 NONE            NO  NO
ID                1000000    .000001 NONE            NO  NO

 

So the bowie_date column indeed has 10000 distinct dates.

I’ll now create a standard b-tree index on this column:

SQL> create index bowie_date_i on bowie(bowie_date);
Index created.

So the data looks as follows with the bowie_date a varchar2 column that actually contains date data, with the following min/max ranges:

SQL> select * from bowie where rownum <11;

        ID       CODE BOWIE_DATE
---------- ---------- ----------
       916        916 2013 11 22
       917        917 2013 11 21
       918        918 2013 11 20
       919        919 2013 11 19
       920        920 2013 11 18
       921        921 2013 11 17
       922        922 2013 11 16
       923        923 2013 11 15
       924        924 2013 11 14
       925        925 2013 11 13

10 rows selected.

SQL> select min(bowie_date) min, max(bowie_date) max from bowie;

MIN        MAX
---------- ----------
1989 01 09 2016 05 26

If we run a query that uses an equality predicate as follows:

SQL> select * from bowie where bowie_date = '2015 06 13';
100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1525056162

----------------------------------------------------------------------------------------------------
| Id | Operation                             | Name         | Rows | Bytes | Cost (%CPU)|     Time |
----------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                      |              |  100 |  2000 |     103 (0)| 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED   | BOWIE        |  100 | 2000  |     103 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN                      | BOWIE_DATE_I |  100 |       |       3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("BOWIE_DATE"='2015 06 13')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
110 consistent gets
16 physical reads
0 redo size
4383 bytes sent via SQL*Net to client
618 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed 

The CBO gets things just about right. 100 rows are estimated and indeed 100 rows are retrieved. So we have confidence the CBO has made the right decision in using the index here as so few rows are actually retrieved.

However, if we run a range scan predicate such as the following:

SQL> select * from bowie where bowie_date between '2015 10 10' and '2015 12 31';

8300 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1525056162

----------------------------------------------------------------------------------------------------
| Id | Operation                          | Name         | Rows | Bytes | Cost (%CPU)|     Time |
----------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                   |              |  100 | 2000  |     104 (0)| 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE        |  100 | 2000  |     104 (0)| 00:00:01 |
|*2  | INDEX RANGE SCAN                   | BOWIE_DATE_I |  100 |       |       3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("BOWIE_DATE">='2015 10 10' AND "BOWIE_DATE"<='2015 12 31')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
8881 consistent gets
38 physical reads
0 redo size
315219 bytes sent via SQL*Net to client
6635 bytes received via SQL*Net from client
555 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
8300 rows processed

The CBO has got the costings wrong here. It still estimates  only 100 rows are to be returned when in actual fact 8300 rows come back.

If we select an even larger “date”¬†range:

SQL> select * from bowie where bowie_date between '2000 10 10' and '2016 12 31';

570800 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1525056162

----------------------------------------------------------------------------------------------------
| Id | Operation                          | Name         | Rows | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT                   |              |  116 |  2320 |     120 (0)| 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE        |  116 |  2320 |     120 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN                   | BOWIE_DATE_I |  116 |       |       3 (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("BOWIE_DATE">='2000 10 10' AND "BOWIE_DATE"<='2016 12 31')

Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
610491 consistent gets
1704 physical reads
0 redo size
21575496 bytes sent via SQL*Net to client
419135 bytes received via SQL*Net from client
38055 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
570800 rows processed

The CBO has got things seriously wrong here. We’re actually returning over 1/2 the table, some 570,800 rows but the CBO thinks only 116 rows will be returned. Why ?

The problem comes back to storing date values as characters. The CBO has absolutely no idea that these “characters” are actually meant to be dates and has no idea that the only valid ranges of values are date values.

With a possible range between¬†“1989 01 09” and “2016 05 26“, any character range/combination of values (up to the 42 column size) could potentially exist in this column ( value ‘1zxgs$.jKN6tglhasgdlhlhd23bkbk?k’ for example).

So the required range between¬†‘2000 10 10’ and ‘2016 12 31’ actually represents a relatively narrow range of possible values within the range of all possible values (especially as the leading column differs between the min/max).

Hence why the CBO is estimating such a low number of rows to be returned and hence why the CBO is deciding to incorrectly use the index. “Hiding” the meaning and distribution of values from the CBO in this manner can be problematic to say the least.

Worse of course is also the possibility of “invalid” dates being entered as the database has no implicit way to police the integrity of the data:

SQL> insert into bowie values (1000001, 42, '2016 6 31');

1 row created.

SQL> rollback;

Rollback complete.

We know there’s no such date as 31st June but Oracle has no idea that this is logically invalid data. Or the value¬†‘lh;ghsgdsd7gdGLH96bb’ for that matter …

Did I mention that storing dates in a character column is a really really bad idea …

Now there are a couple of ways to help address these issues if changing the column and its datatype is not possible due to the application. One way is to create a function-based index as follows on a date version of the column:

SQL> create index bowie_date_fn_i on bowie(to_date(bowie_date,'YYYY MM DD'));

Index created.

If we run the equality query but this time referencing the to_date function:

SQL> select * from bowie where to_date(bowie_date, 'YYYY MM DD') = '13-JUN-2015';

100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 960797537

-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows  | Bytes  | Cost (%CPU)|     Time |
-------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                 | 10000 |   273K |     893 (2)| 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE           | 10000 |   273K |     893 (2)| 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_DATE_FN_I |  4000 |        |       3 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(TO_DATE("BOWIE_DATE",'YYYY MM DD')=TO_DATE(' 2015-06-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
12 recursive calls
0 db block gets
113 consistent gets
16 physical reads
0 redo size
3268 bytes sent via SQL*Net to client
618 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed

We notice the CBO has got the row estimate way wrong here, thinking that 10000 rows, not 100 rows are to be returned. Why ?

Remember, when we create a function-based index, Oracle creates a hidden virtual column on the table as Oracle needs a way to store the statistics associated with the result set from the function. But these statistics aren’t populated until we next collect statistics on the table (or explicitly for just the hidden columns) and without the column statistics, the CBO¬†can make poor assumptions:

SQL> select column_name, data_type, num_distinct, density, histogram, hidden_col
umn, virtual_column from dba_tab_cols where table_name='BOWIE';

COLUMN_NAME  DATA_TYPE NUM_DISTINCT    DENSITY HISTOGRAM       HID VIR
------------ --------- ------------ ---------- --------------- --- ---
SYS_NC00004$ DATE                              NONE            YES YES
BOWIE_DATE   VARCHAR2         10000      .0001 NONE            NO  NO
CODE         NUMBER            1000       .001 NONE            NO  NO
ID           NUMBER         1000000    .000001 NONE            NO  NO

There are no column statistics for the virtual column but we note the data type of the virtual column is DATE. So let’s collect new statistics on the table:

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'BOWIE');

PL/SQL procedure successfully completed.
         .
SQL> select column_name, data_type, num_distinct, density, histogram, hidden_col
umn, virtual_column from dba_tab_cols where table_name='BOWIE';

COLUMN_NAME  DATA_TYPE NUM_DISTINCT    DENSITY HISTOGRAM       HID VIR
------------ --------- ------------ ---------- --------------- --- ---
SYS_NC00004$ DATE             10000      .0001 NONE            YES YES
BOWIE_DATE   VARCHAR2         10000      .0001 HYBRID          NO  NO
CODE         NUMBER            1000       .001 NONE            NO  NO
ID           NUMBER         1000000    .000001 NONE            NO  NO

There are actually two key differences in the above statistics, but in this post I’ll just focus on the fact that the hidden virtual column now has associated statistics (I’ll discuss the other key difference in my next blog post).

When we re-run the query:

SQL> select * from bowie where to_date(bowie_date, 'YYYY MM DD') = '13-JUN-2015';

100 rows selected.
Execution Plan
----------------------------------------------------------
Plan hash value: 960797537

-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes | Cost (%CPU) | Time     |
-------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                 |  100 |  2800 |     102 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | BOWIE           |  100 |  2800 |     102 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | BOWIE_DATE_FN_I |  100 |       |       3 (0) | 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(TO_DATE("BOWIE_DATE",'YYYY MM DD')=TO_DATE(' 2015-06-13 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
108 consistent gets
0 physical reads
0 redo size
3268 bytes sent via SQL*Net to client
618 bytes received via SQL*Net from client
8 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
100 rows processed

The CBO has now got the estimate spot on.

If we now run the previous range scan query that accessed 1/2 the table, referencing the to_date function:

SQL> select * from bowie where to_date(bowie_date, 'YYYY MM DD') between '10-OCT-2000' and '31-DEC-2015';

556100 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows  | Bytes  | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       |  556K |    14M |   1328 (34)| 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE |  556K |    14M |   1328 (34)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(TO_DATE("BOWIE_DATE",'YYYY MM DD')>=TO_DATE(' 2000-10-10
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND TO_DATE("BOWIE_DATE",'YYYY MM
DD')<=TO_DATE(' 2015-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
8 recursive calls
0 db block gets
40444 consistent gets
0 physical reads
0 redo size
18804277 bytes sent via SQL*Net to client
408355 bytes received via SQL*Net from client
37075 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
556100 rows processed

The CBO has got the estimates pretty well spot on and is now performing the far more efficient Full Table Scan. The CBO knows that the virtual column is of type DATE and therefore can much more accurately determine the actual cardinality estimates for the range scan on the “date” column.

If we now run the other corresponding range scan that returned a moderate number of rows:

SQL> select * from bowie where to_date(bowie_date, 'YYYY MM DD') between '10-OCT-2015' and '31-DEC-2015';

8300 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id | Operation         | Name  | Rows | Bytes  | Cost (%CPU) | Time     |
---------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |       | 8401 |   229K |   1205 (27) | 00:00:01 |
|* 1 | TABLE ACCESS FULL | BOWIE | 8401 |   229K |   1205 (27) | 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(TO_DATE("BOWIE_DATE",'YYYY MM DD')>=TO_DATE(' 2015-10-10
00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND TO_DATE("BOWIE_DATE",'YYYY MM
DD')<=TO_DATE(' 2015-12-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

Statistics
----------------------------------------------------------
14 recursive calls
0 db block gets
4058 consistent gets
0 physical reads
0 redo size
282075 bytes sent via SQL*Net to client
6635 bytes received via SQL*Net from client
555 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
8300 rows processed

We notice that again the CBO has got the row estimate just about right and because the index has a poor clustering factor, the CBO still decided to go down the Full Table Scan path.

Even if we don’t use the index all that much (perhaps the Database In-Memory store is a better option for most queries)¬†, it can still play an important role in policing the integrity of the data. An attempt to insert an invalid date will now automatically be captured by the database and fail:

SQL> insert into bowie values (1000001, 42, '2016 6 31');
insert into bowie values (1000001, 42, '2016 6 31')
*
ERROR at line 1:
ORA-01839: date not valid for month specified

As the function-based index can only be populated or updated if a valid date is inserted into the table, any attempt to insert an invalid date will fail thus protecting the integrity of the data.

The best “fix” here is to store these dates in a date field within the database, where the above issues are automatically addressed. If this is not possible, then the introduction and usage of an associated¬†function-based index can certainly assist the CBO in making the correct decision.

However, if data integrity is not a concern (the application does a fantastic job of it and no one ever has to perform manual data fixes directly in the database), then there’s another option to help make the CBO do the right thing.

The clue is back in how I collected the table statistics, which I’ll discuss in my next post.

New Oracle Cloud Offering ‚Äď Indexing as a Service (IDXaaS) (I Pity The Fool) April 1, 2016

Posted by Richard Foote in 12c Rel 2, IDXaaS, Oracle Cloud, Oracle Indexes.
12 comments

This of course is an April Fools joke. Sorry !!

A very exciting announcement !! I‚Äôve recently been promoted within Oracle Corporation to lead their brand new Oracle Cloud offering ‚ÄúIndexing as a Service‚ÄĚ (IDXaaS) team, based on my previous work and experience in the indexing space. Yes, I’m both thrilled and excited to be involved in this new venture ūüôā

Promising both improved levels of service and reductions in maintenance and administration overheads, this new Oracle Cloud offering will take all indexing related hassles from on-premise databases and address them in the Cloud. No more index rebuilds, no more concerns you don’t have the right indexes in place and importantly no more worrying that indexes are taking up valuable TBs of storage. Oracle indexes can now be safely and seamlessly migrated and stored in the Oracle Cloud.

Relational databases depend heavily on indexes to ensure data¬†can be retrieved quickly and efficiently, but indexes can be very difficult to administer, can use¬†considerable resources to maintain and consume considerable amounts of disk space. They can be so problematic that some folk actually devote an entire blog on the subject !! Imagine if all these issues and overheads can be taken away from you when administrating Oracle databases…

Index DDL will be enhanced in the 12c Release 2 Oracle Database to optionally specify a Cloud Domian Identifier (CDI) instead of a tablespace. So you will be able to either build or rebuild an index into the Oracle IDXaaS Cloud with syntax such as follows: CREATE INDEX bowie_idx on bowie(id) CLOUD LOCATION bowies_cloud_domain; This index will fully index the bowie table that exists in an on premise database (or indeed a database that is also in the cloud) but the index itself will be created and maintained in the Oracle Cloud. The database is fully aware of the actual physical location of the index as it is when created in a traditional tablespace and will simply update the index structure as DML is applied to the base table as it has always done. All Oracle indexing types and options will be fully supported including Oracle Text and Spatial indexes.

Like indexes stored in a tablespace, these indexes will need to be rebuilt or coalesced every now and then, but you won‚Äôt need to worry about this as Oracle will now implicitly do this for you in the Oracle Cloud. I‚Äôm currently hiring a team of DBAs with expertise in the ‚Äúart‚ÄĚ of rebuilding indexes who will ensure all your indexes will be rebuilt automatically for customers¬†as necessary. By default, these operations will usually be performed on Sundays within the Oracle Cloud, but as indexes can be rebuilt online and as these indexes are no longer physically stored within your database, it‚Äôs an operation that is now performed automatically and seamlessly for you. Customers no longer need concern themselves with these types of index operations or when they occur, so¬†while¬†‚Äúwe‚ÄĚ work over the weekend, your¬†DBAs can instead relax and keep their gardens nice and weed free. Please leave a comment on the blog if you think you have what it takes to effectively rebuild Oracle indexes and interested in joining my elite IDXaaS team !!

For customers that choose to use the ‚ÄúPremium IDXaaS‚ÄĚ offering, Oracle will automatically create or drop indexes as required in the Oracle Cloud. Customers no longer have to make the decision on what indexes should be created on which columns in which order; this will be performed automatically for you by Oracle’s IDXaaS. By running new advanced analysis routines on your database workloads, the Premium IDXaaS offering will create an index if it will help the efficiency of any of your current workloads. Conversely, if Oracle believes an index is redundant (for example if your database is running on Exadata), indexes will first be automatically made ‚Äúinvisible‚ÄĚ and if there are no serious repercussions, will be automatically dropped 7 days later. DBAs and Developers will no longer need to worry about which indexes to create/drop, significantly reducing the complexities in managing both current and future Oracle database environments.

Oracle will also introduce a new package that will easily migrate all your existing indexes for you into the Oracle Cloud. The DBMS_INDEX_CLOUD package will automatically migrate all indexes for a specified Table/Partition/Schema/Pluggable Database/Database into the Oracle Cloud, which will be a fully online, seamless operation. Once completed, associated indexes within ‚Äúon premise‚ÄĚ tablespaces will effectively be dropped and be replaced with their Cloud equivalents. All that ‚Äúexpensive‚ÄĚ storage that was previously tied up in storing all those Oracle indexes can now be freed up to store much more business rich data such as emails, selfie photos, David Bowie music catalogs, etc. Note that these significant storage savings will also translate in smaller backups, smaller Standby databases, smaller Production clones, etc. Importantly, the Oracle Cost Based Optimizer will be fully aware of the Cloud Indexes (as their definitions, statistics and other metadata are still stored within the database data dictionary) and will use the Cloud Indexes as appropriate. No changes to existing SQL is necessary, the CBO will decide to use an index in the cloud in the same manner as it chooses to use an index currently.¬†There is also a new Cloud Index execution path option that allows data to be retrieved via the Cloud Index in a “good enough” manner which is much more efficient than a normal index scan, although it might not necessarily retrieve all the possible data that meets an SQL predicate. It effectively provides what we refer to as¬†that “internet” data retrieval experience.

Oracle Open World 2016 will be the time of the official release for this exciting new Oracle Cloud offering. I will be presenting a number of sessions at OOW16 on this new IDXaaS and there will be a number of live demos to demonstrate its unique capabilities and the simplicity of the migration process. However, there will be a number of announcements before then, including the opportunity to be involved in a beta customer program. There will also be a number of hands-on workshops being conducted globally, with customers getting the chance to see for themselves how easily it is to move database indexes into the Oracle Cloud.

Lastly and perhaps most exciting of all, this new IDXaaS¬†capability will initially be a free option with all current Oracle databases from 12c Release 2 onwards. You will be able to migrate all your existing 12.2 Oracle database indexes onto the Oracle Cloud and they will be stored, maintained and administrated for free as part of your current Oracle database licenses. All Oracle database editions will be supported. This free offer however is likely to end once the 12c Release 2 database is officially released but the additional pay as you go licensing constructs are proposed to be extremely competitive with yearly “indexed” priced increments. Also to be subsequently released will be Oracle IDXaaS support for most other commercial databases including DB2, SQL¬†Server, NoSQL, MySQL, Sybase, Hana, Hadoop and Notepad.

So administrating Oracle (and other) Databases will soon change forever.¬† Oracle indexes will no longer have to be a concern for customers as all database indexes can instead be stored in the Oracle Cloud via the new IDXaaS offering, allowing Oracle to automatically store and manage these tricky database objects for you. Not only will index rebuilds be a thing of the past, but so will all concerns regarding index storage, creation and maintenance. I‚Äôm very excited to be involved in this new undertaking and indeed with my move to Oracle HQ in San Francisco in the coming months (I‚Äôll up in the Clouds level of Building 1). I look forward to talking IDXaaS with you in the days to come ūüôā

 

This of course is an April Fools joke. Sorry !!

New Round of Let’s Talk Database Events February 12, 2016

Posted by Richard Foote in Oracle Indexes.
3 comments

I’ll be presenting a new round of “Let’s Talk Database” events around Australia and NZ next month.¬†These are free events but have often “sold out” in the past so booking early is recommended to avoid disappointment.

All events run between¬†9:00am –¬†12:30pm and are¬†followed by a networking lunch.

Currently, the confirmed events are:

  • Sydney – Tuesday, 22 March 2016:¬†Let‚Äôs Talk Oracle Database:¬†Performance Diagnostics ‚Äď How To Correctly Use AWR Reports. To register, email:¬†mitch.foster@oracle.com
  • Melbourne – Wednesday, 23 March 2016:¬†Let‚Äôs Talk Oracle Database: Oracle 12c Database New Features. To register, email:¬†johanne.sergeant@oracle.com SOLD OUT !!
  • Canberra – Thursday, 31 March 2016: Let’s Talk Database: Intro To CBO Costings, APEX New Features and Software-on-Silicon. To register, email: richard.foote@oracle.com
  • Perth – Thursday, 5 May 2016: Let’s Talk Database:¬†Performance Diagnostics ‚Äď How To Correctly Use AWR Reports. To register, email: dennis.ward@oracle.com
  • Adelaide – Friday, 6 May 2016:¬†Let’s Talk Database: Intro To CBO Costings, Database Futures. To register, email: brodie.james@oracle.com
  • Wellington Wednesday, 11 May 2016:¬†Let‚Äôs Talk Oracle Database: Oracle 12c Database New Features. To register, email: tanya.fea@oracle.com
  • Brisbane: Wednesday, 18 May 2016:¬†Let‚Äôs Talk Oracle Database:¬†Performance Diagnostics ‚Äď How To Correctly Use AWR Reports. To register, email: gordon.laing@oracle.com

 

More “Let’s Talk Database” events will be added in the coming days.

Details of sessions:

Oracle 12c Database New Features

An in-depth technical look at many of the exciting enhancements and new capabilities introduced in the Oracle 12c Database that help make life easier for the DBA and Oracle Developer. Among a number of topics, we’ll have a look at changes with respect the Cost Based Optimizer, new statistics gathering options, discuss many of the new indexing related capabilities such as Partial Indexes, Advanced Index Compression, Asynchronous Global Index Maintenance, etc. and look at many of those cool little things such as In-Database Archiving, Automatic Data Optimization, Identity Columns, Row-Limiting Clauses, Temporary Undo, Online Move operations, Clustering Attributes, Temporal and Improved Flashback Queries etc. to name but a few that can sometimes make all the difference. Might even squeeze in a bit of Database In-Memory. So plenty to talk about !!!

Performance Diagnostics ‚Äď How To Correctly Use AWR Reports

I often get called in by customers to determine and address the root cause of database performance issues. Depending on the issue, a request for a simple Automatic Workload Repository (AWR) report is often sufficient to accurately diagnose the root problem(s). However, many DBAs find AWR reports daunting as they contain so much database diagnostic information, they don’t even know where to begin to find the truly useful information.

In this session, I begin by discussing the basic concepts of response times, DB times, the Oracle Wait Interface and how to focus on what is truly important. I then look at how to best read an AWR report to quickly go to the most relevant sections that detail any specific issues. I also discuss a general tuning and diagnostic methodology that ensures one can quickly determine whether an AWR report will indeed be sufficient and how to accurately and consistently use the AWR report to pinpoint and determine root causes for global database performance issues. We‚Äôll go through a number of actual ‚Äúreal-life‚ÄĚ examples that highlight various performance issues and how one accurately determines the actual root issues through an appropriate AWR report.

Time permitting, we’ll go through some other useful diagnostic tools such as the AWR Warehouse, Automatic Database Diagnostic Monitor (ADDM), Active Session History (ASH)  and SQL Extended Tracing and scenarios when they can prove to be invaluable in addressing database performance issues.

Intro To CBO Costings, APEX New Features and Software-on-Silicon

A common issue is confusion on why the Oracle Cost Based Optimizer (CBO) has made the decision the use a full table scan or use that index and not the one that appears more appropriate. The costing figures that the CBO derives and displays in execution plans are often viewed as cryptic, which is unfortunate as these costings are key to CBO decisions. Often the attempt to resolve such issues is to set inappropriate database parameters that introduces as many issues as it tries to resolve. This session looks under the covers of the CBO to see how it precisely derives its various costings and hence how to correctly troubleshoot and resolve inefficient queries. Bring a calculator, it might come in handy J

Oracle Application Express (APEX) is a rapid development environment built into the Oracle database. APEX 5 is the latest and largest release of Application Express. This APEX session will include an overview of APEX concentrating on new features, followed by a live demonstration. The session will also look at how easy it is to deploy APEX applications in the cloud.

Oracle’s Software-in-Silicon features extend the concepts of co-engineering hardware and software inherent in engineered systems such as the Exadata. Embedding software which has traditionally sat in the database layer into the hardware layer can yield both performance and security beyond that available of a traditional CPU. This presentation covers how Oracle Database 12c can leverage Software-in-Silicon features, and how third party developers can make use of the new features within their own applications.

 

Hope to see you at one of these events ūüôā

Indexes and Initrans (Blackstar) January 14, 2016

Posted by Richard Foote in Oracle Indexes.
7 comments

It’s been a very tough week so to help keep my¬†mind off rather sad events, thought I’ll finish off¬†one of my unpublished articles.

Initrans is a physical attribute that determines the initial number of concurrent transaction entries allocated within each data block for a given table/index/cluster. Every transaction that updates a block has to acquire an Interested Transaction List (ITL) slot, in order to store the transaction id, rollback information and ultimately lock the necessary row/index entry within the block.

For a table, the inability to acquire a necessary ITL can lead to hanging transactions and the resultant poor performance. A simple demonstration.

I’ll first create and populate a table with PCTFREE 0 so that the table blocks are full with no free space:

SQL> create table ziggy (id number, code number, name varchar2(42)) pctfree 0;

Table created.

SQL> insert into ziggy select rownum, mod(rownum,100), ‘Ziggy Stardust’
from dual connect by level <= 1000000;

1000000 rows created.

SQL> commit;

Commit complete.

If we look at a partial block dump of an example block from the table (which happens to hold the first rows inserted):

Block header dump: 0x01835b37
Object id on Block? Y
seg/obj: 0x190b7 csc: 0x00.2affca5 itc: 2 flg: E typ: 1 – DATA
brn: 0 bdba: 0x1835b30 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0009.010.0000d52e 0x014001a4.0576.0e –U- 302 fsc 0x0000.02affcf5
0x02 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
bdba: 0x01835b37
data_block_dump,data header at 0x1fc8264
===============
tsiz: 0x1f98
hsiz: 0x26e
pbl: 0x01fc8264
76543210
flag=——–
ntab=1
nrow=302
frre=-1
fsbo=0x26e
fseo=0x283
avsp=0x15
tosp=0x15
0xe:pti[0] nrow=302 offs=0
0x12:pri[0] offs=0x718
0x14:pri[1] offs=0x730
0x16:pri[2] offs=0x748
0x18:pri[3] offs=0x760
0x1a:pri[4] offs=0x778

….

0x26a:pri[300] offs=0x6e6
0x26c:pri[301] offs=0x6ff
block_row_dump:
tab 0, row 0, @0x718
tl: 24 fb: –H-FL– lb: 0x1 cc: 3
col 0: [ 2] c1 02
col 1: [ 2] c1 02
col 2: [14] 5a 69 67 67 79 20 53 74 61 72 64 75 73 74
tab 0, row 1, @0x730
tl: 24 fb: –H-FL– lb: 0x1 cc: 3
col 0: [ 2] c1 03
col 1: [ 2] c1 03
col 2: [14] 5a 69 67 67 79 20 53 74 61 72 64 75 73 74
tab 0, row 2, @0x748
tl: 24 fb: –H-FL– lb: 0x1 cc: 3
col 0: [ 2] c1 04
col 1: [ 2] c1 04
col 2: [14] 5a 69 67 67 79 20 53 74 61 72 64 75 73 74
tab 0, row 3, @0x760
tl: 24 fb: –H-FL– lb: 0x1 cc: 3
col 0: [ 2] c1 05
col 1: [ 2] c1 05
col 2: [14] 5a 69 67 67 79 20 53 74 61 72 64 75 73 74
tab 0, row 4, @0x778
tl: 24 fb: –H-FL– lb: 0x1 cc: 3
col 0: [ 2] c1 06
col 1: [ 2] c1 06
col 2: [14] 5a 69 67 67 79 20 53 74 61 72 64 75 73 74

….

The key aspect to note here is that the table is allocated two ITL entries by default. Therefore, two concurrent transactions can safely update this block but as there is no free space sufficient for an additional ITL to be dynamically added, what happens if we have three concurrent transactions ?

In session one:

SQL> delete ziggy where id=1;

1 row deleted.

In session two:

SQL> delete ziggy where id=2;

1 row deleted.

But in session three:

SQL> delete ziggy where id=3;

 

It hangs and will do so until one of the other transactions complete.

So Initrans can preallocate more ITL entries (up to 255) to prevent these types of scenarios.

But what of indexes, is the story the same ? Well, no as we’ll see.

I’ll create another table, but this time give the table its default 10% of freespace but create the index with a PCTFREE 0:

SQL> create table bowie (id number, code number, name varchar2(42));

Table created.

SQL> insert into bowie select rownum, mod(rownum,100), ‘David Bowie’
from dual connect by level <=1000000;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> create index bowie_id_i on bowie(id) pctfree 0;

Index created.

If we look at a partial block dump of the first leaf block in the index:

Block header dump: 0x018ad71c
Object id on Block? Y
seg/obj: 0x190b6 csc: 0x00.2aff9d5 itc: 2 flg: E typ: 2 – INDEX
brn: 0 bdba: 0x18ad718 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x02 0xffff.000.00000000 0x00000000.0000.00 C— 0 scn 0x0000.02aff9d5
Leaf block dump
===============
header address 33325668=0x1fc8264
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 540
kdxcofbo 1116=0x45c
kdxcofeo 1120=0x460
kdxcoavs 4
kdxlespl 0
kdxlende 0
kdxlenxt 25876253=0x18ad71d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8036
row#0[8024] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 80 01 57 00 00
row#1[8012] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 03
col 1; len 6; (6): 01 80 01 57 00 01
row#2[4523] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 04
col 1; len 6; (6): 01 80 01 57 00 02
row#3[4535] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 05
col 1; len 6; (6): 01 80 01 57 00 03
row#4[4547] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 06
col 1; len 6; (6): 01 80 01 57 00 04

We notice that it also has two ITL entries by default.

However, the first ITL entry is reserved exclusively for Oracle recursive operations on the index (such as changes as a result of an index block split). An index leaf block can therefore only handle one concurrent transaction by default.

Notice also that the leaf block currently contains 540 index entries.

So what happens if we have 2 concurrent transactions wanting to update the same leaf block ?

In session one:

SQL> delete bowie where id=1;

1 row deleted.

In session two:

SQL> delete bowie where id=2;

1 row deleted.

Both complete successfully !! So what has occurred here ? Let’s look at a fresh partial block dump of this leaf block:

Block header dump: 0x018ad71c
Object id on Block? Y
seg/obj: 0x190b6 csc: 0x00.2affa76 itc: 3 flg: E typ: 2 – INDEX
brn: 0 bdba: 0x18ad718 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0007.00b.0000ce8a 0x0140da06.0568.01 CB– 0 scn 0x0000.02affa76
0x02 0x0001.013.0000cd57 0x01400083.0559.02 —- 1 fsc 0x000e.00000000
0x03 0x0005.01a.0000d062 0x014009eb.0532.13 —- 1 fsc 0x000e.00000000
Leaf block dump
===============
header address 33325692=0x1fc827c
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 278
kdxcofbo 592=0x250
kdxcofeo 4499=0x1193
kdxcoavs 3907
kdxlespl 0
kdxlende 2
kdxlenxt 25675901=0x187c87d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 8012
row#0[4499] flag: —DS–, lock: 2, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 80 01 57 00 00
row#1[4511] flag: —D—, lock: 3, len=12
col 0; len 2; (2): c1 03
col 1; len 6; (6): 01 80 01 57 00 01
row#2[4523] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 04
col 1; len 6; (6): 01 80 01 57 00 02
row#3[4535] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 05
col 1; len 6; (6): 01 80 01 57 00 03
row#4[4547] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 06
col 1; len 6; (6): 01 80 01 57 00 04

So we have a few interesting things happening here.

Firstly, Oracle has allocated an additional ITL entry, we now have three in this leaf block. And indeed, both deleted rows have the deleted byte (D) set. But how is this possible when the leaf block was previously full ?

When an index¬†is full but requires additional data to be inserted (as in the case of a new index entry), it has a mechanism to cater for this. It’s called an index block split and this is exactly what has occurred here. The second transaction needed additional space within the leaf block for an new ITL entry, so Oracle performed a 50-50 index block split thus providing a half empty block in which to now insert the¬†new ITL entry.

If we look at the number of index entries now in the leaf block (kdxconro), we notice it has gone down from 540 to just 278. If we look at the next leaf block point (kdxlenxt), we notice it now points to a different block (25675901=0x187c87d).

An index block split, which ordinarily occurs frequently enough within an index, is a much better option than having a hanging transaction suffering from ITL contention issues.

As we now have heaps of free space within the leaf block, if we want to now perform yet another concurrent transaction:

In session three:

SQL> delete bowie where id=3;

1 row deleted.

It has also completed successfully. A fresh partial block dump of the index leaf block will show that Oracle has just added yet another ITL entry:

Block header dump: 0x018ad71c
Object id on Block? Y
seg/obj: 0x190b6 csc: 0x00.2affb79 itc: 4 flg: E typ: 2 – INDEX
brn: 0 bdba: 0x18ad718 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0007.00b.0000ce8a 0x0140da06.0568.01 CB– 0 scn 0x0000.02affa76
0x02 0x0001.013.0000cd57 0x01400083.0559.02 —- 1 fsc 0x000e.00000000
0x03 0x0005.01a.0000d062 0x014009eb.0532.13 —- 1 fsc 0x000e.00000000
0x04 0x0006.015.0000cfbe 0x0140040f.055e.18 —- 1 fsc 0x000e.00000000
Leaf block dump
===============
header address 33325716=0x1fc8294
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 1
kdxconro 278
kdxcofbo 592=0x250
kdxcofeo 4475=0x117b
kdxcoavs 3883
kdxlespl 0
kdxlende 3
kdxlenxt 25675901=0x187c87d
kdxleprv 0=0x0
kdxledsz 0
kdxlebksz 7988
row#0[4475] flag: —DS–, lock: 2, len=12
col 0; len 2; (2): c1 02
col 1; len 6; (6): 01 80 01 57 00 00
row#1[4487] flag: —D—, lock: 3, len=12
col 0; len 2; (2): c1 03
col 1; len 6; (6): 01 80 01 57 00 01
row#2[4499] flag: —D—, lock: 4, len=12
col 0; len 2; (2): c1 04
col 1; len 6; (6): 01 80 01 57 00 02
row#3[4511] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 05
col 1; len 6; (6): 01 80 01 57 00 03
row#4[4523] flag: ——-, lock: 0, len=12
col 0; len 2; (2): c1 06
col 1; len 6; (6): 01 80 01 57 00 04

So indexes are far more forgiving than tables when it comes to concurrency issues in relation to transactions acquiring ITL entries.

If we want to perhaps avoid (the generally minimal) overheads of dynamically allocating new ITL entries within indexes or perhaps unnecessary wasted space in monotonically increasing indexes from such block split operations on compactly created indexes, we can use Initrans to preallocate more than the default two ITL entries for indexes:

SQL> drop index bowie_id_i;

Index dropped.

SQL> create index bowie_id_i on bowie(id) initrans 10;

Index created.

Block header dump: 0x018ad71d
Object id on Block? Y
seg/obj: 0x190b2 csc: 0x00.2aff443 itc: 10 flg: E typ: 2 – INDEX
brn: 0 bdba: 0x18ad718 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x02 0xffff.000.00000000 0x00000000.0000.00 C— 0 scn 0x0000.02aff443
0x03 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x04 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x05 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x06 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x07 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x08 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x09 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
0x0a 0x0000.000.00000000 0x00000000.0000.00 —- 0 fsc 0x0000.00000000
Leaf block dump
===============
header address 33325860=0x1fc8324
kdxcolev 0
KDXCOLEV Flags = – – –
kdxcolok 0
kdxcoopc 0x80: opcode=0: iot flags=— is converted=Y
kdxconco 2
kdxcosdc 0
kdxconro 466
kdxcofbo 968=0x3c8
kdxcofeo 1791=0x6ff
kdxcoavs 823
kdxlespl 0
kdxlende 0
kdxlenxt 25876254=0x18ad71e
kdxleprv 25876252=0x18ad71c
kdxledsz 0
kdxlebksz 7844
row#0[7831] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 05 4b
col 1; len 6; (6): 01 80 01 53 00 a6
row#1[7818] flag: ——-, lock: 0, len=13
col 0; len 3; (3): c2 05 4c
col 1; len 6; (6): 01 80 01 53 00 a7

But this is generally unnecessary for indexes and just wastes space within index leaf blocks.

Now back to my grieving ūüė¶