jump to navigation

Oracle 12c: Indexing JSON in the Database Part II (Find A Little Wood) August 5, 2016

Posted by Richard Foote in 12c, JSON, JSON Text Index, Oracle Indexes.
4 comments

In Part I, we looked at how you can now store JSON documents within the Oracle 12c Database. For efficient accesses to JSON documents stored in the Oracle database, we can either create a function-based index based on the JSON_VALUE function or on JSON .dot notation.

These indexes are useful for indexing specific JSON attributes, but what if we want to index multiple JSON attributes within a single index structure.

To start, I’m just going to add an extra row to increase the selectivity of other columns.

SQL> insert into ziggy_json
2 select
3 100001,
4 SYSdate,
5 '{"PONumber" : 1000001,
6 "Reference" : "MTOM-20161",
7 "Requestor" : "Major Tom",
8 "User" : "MTOM",
9 "CostCenter" : "B42",
10 "ShippingInstructions" : {"name" : "Major Tom",
11 "Address": {"street" : "42 Ziggy Street",
12 "city" : "Canberra",
13 "state" : "ACT",
14 "zipCode" : 2601,
15 "country" : "Australia"},
16 "Phone" : [{"type" : "Office", "number" : "417-555-7777"},
17 {"type" : "Mobile", "number" : "417-555-1234"}]},
18 "Special Instructions" : null,
19 "AllowPartialShipment" : true,
20 "LineItems" : [{"ItemNumber" : 1,
21 "Part" : {"Description" : "Hunky Dory",
22 "UnitPrice" : 10.95},
23 "Quantity" : 5.0},
24 {"ItemNumber" : 2,
25 "Part" : {"Description" : "Pin-Ups",
26 "UnitPrice" : 10.95},
27 "Quantity" : 3.0}]}'
28 from dual;

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

We can still create composite indexes based on the JSON_VALUE function as we can with conventional columns:

SQL> create index ziggy_json_idx3 on 
     ziggy_json(json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20)), 
                json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6)));

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

If we now run a query with a couple of JSON_VALUE based predicates:

SQL> select * from ziggy_json
     where json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20))='MTOM' and 
           json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6)) = 'B42';

Execution Plan
----------------------------------------------------------
Plan hash value: 3402615542

-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes | Cost (%CPU) | Time     |
-------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                 |    1 |  1533 |       4 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON      |    1 |  1533 |       4 (0) | 00:00:01 |
|*2  | INDEX RANGE SCAN                    | ZIGGY_JSON_IDX3 |    1 |       |       3 (0) | 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.User' RETURNING VARCHAR2(20) NULL ON
ERROR)='MTOM' AND JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.CostCenter' RETURNING VARCHAR2(6) NULL ON ERROR)='B42')
 
Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   5 consistent gets
   0 physical reads
   0 redo size
1248 bytes sent via SQL*Net to client
 820 bytes received via SQL*Net from client
   4 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

The composite index is effectively used by the CBO as expected.

It does though make our SQL a little cumbersome to write. To simplify things a tad, we could create a couple of virtual columns based on these functions, create the JSON function-based indexes on these virtual columns and simplify the SQL accordingly.

First, we create the virtual columns (note they’re virtual columns and so consume no storage):

 

SQL> ALTER TABLE ziggy_json ADD (userid VARCHAR2(20)
2 GENERATED ALWAYS AS (json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20))));

Table altered.

SQL> ALTER TABLE ziggy_json ADD (costcenter VARCHAR2(6)
2 GENERATED ALWAYS AS (json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6))));

Table altered.

 

Next, create the index based on these newly created virtual columns:

SQL> CREATE INDEX ziggy_user_costctr_idx on ziggy_json(userid, costcenter);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

And then write a simplified version of the SQL to reference the virtual columns:

SQL> select * from ziggy_json where userid='MTOM' and costcenter='B42';

ID ZIGGY_DAT
---------- ---------
ZIGGY_ORDER
--------------------------------------------------------------------------------

USERID COSTCE
-------------------- ------
100001 24-JUN-16
{"PONumber" : 1000001,
"Reference" : "MTOM-20161",
MTOM B42
  
Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 5717455

--------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                   | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                        |    1 |  1535 |       4 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON             |    1 |  1535 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_USER_COSTCTR_IDX |    1 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("USERID"='MTOM' AND "COSTCENTER"='B42')
   
Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   5 consistent gets
   0 physical reads
   0 redo size
1396 bytes sent via SQL*Net to client
 820 bytes received via SQL*Net from client
   4 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

The index is again used as expected.

Of course, if we still want to reference the JSON functions directly within the SQL, the query can still be written as previously:

SQL> select * from ziggy_json 
     where json_value(ziggy_order, '$.User' returning varchar2(20))='MTOM' and
           json_value(ziggy_order, '$.CostCenter' returning varchar2(6))='B42';

ID ZIGGY_DAT
---------- ---------
ZIGGY_ORDER
--------------------------------------------------------------------------------

USERID COSTCE
-------------------- ------
100001 24-JUN-16
{"PONumber" : 1000001,
"Reference" : "MTOM-20161",
MTOM B42
   
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 5717455

--------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                   | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                        |    1 |  1535 |       4 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON             |    1 |  1535 |       4 (0) | 00:00:01 |
|*2  | INDEX RANGE SCAN                    | ZIGGY_USER_COSTCTR_IDX |    1 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ZIGGY_JSON"."USERID"='MTOM' AND "ZIGGY_JSON"."COSTCENTER"='B42')

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   5 consistent gets
   0 physical reads
   0 redo size
1396 bytes sent via SQL*Net to client
 820 bytes received via SQL*Net from client
   4 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

The index is again used as expected.

I’ll next look at using a JSON Text based index to effectively index the entire JSON document.

Oracle 12c: Indexing JSON In The Database Part I (Lazarus) August 2, 2016

Posted by Richard Foote in 12c, JSON, Oracle Indexes.
1 comment so far

One of the very cool new features introduced in Oracle Database 12c Rel 1 is the ability to store JavaScript Object Notation (JSON) documents within the database. Unlike XML which has its own data type, JSON data can be stored as VARCHAR2, CLOB or BLOB data types, but with a JSON check constraint to ensure the stored parsed document meets JSON document standards.

This enables ‘No-SQL’ schema-less type development within the Oracle Database  for “next generation” applications. Although developers can work with the JSON store without using SQL (via say RESTful APIs directly within the database), you can still leverage the power of SQL for reporting and analytics type purposes. Of course, reading JSON data efficiently then becomes important, and that’s where indexing the JSON document store kicks in.

Let’s look at a simple example.

Firstly, let’s create a table with a column called ZIGGY_ORDER which stores JSON documents.

SQL> CREATE TABLE ziggy_json  (id number,  ziggy_date date,  ziggy_order CLOB  CONSTRAINT ensure_ziggy_json CHECK (ziggy_order IS JSON));</pre>
Table created.

The ZIGGY_ORDER column has a JSON check constraint which ensures only valid JSON documents can be stored. An attempt to insert a row with invalid JSON data will fail:

SQL> insert into ziggy_json values (1, sysdate, '{"This is not legal JSON"}');
insert into ziggy_json values (1, sysdate, '{"This is not legal JSON"}')
*
ERROR at line 1:
ORA-02290: check constraint (BOWIE.ENSURE_ZIGGY_JSON) violated

Let’s insert some JSON data into the table. Note I’m using ROWNUM to insert some almost unique JSON data and then re-inserting the data again to get data worth accessing via an index:

SQL> insert into ziggy_json
2 select
3 rownum,
4 SYSdate,
5 '{"PONumber" : ' || rownum || ',
6 "Reference" : "DBOWIE-2016' || rownum || '",
7 "Requestor" : "David Bowie",
8 "User" : "DBOWIE",
9 "CostCenter" : "A42",
10 "ShippingInstructions" : {"name" : "David Bowie",
11 "Address": {"street" : "42 Ziggy Street",
12 "city" : "Canberra",
13 "state" : "ACT",
14 "zipCode" : 2601,
15 "country" : "Australia"},
16 "Phone" : [{"type" : "Office", "number" : "417-555-7777"},
17 {"type" : "Mobile", "number" : "417-555-1234"}]},
18 "Special Instructions" : null,
19 "AllowPartialShipment" : true,
20 "LineItems" : [{"ItemNumber" : 1,
21 "Part" : {"Description" : "Hunky Dory",
22 "UnitPrice" : 10.95},
23 "Quantity" : 5.0},
24 {"ItemNumber" : 2,
25 "Part" : {"Description" : "Pin-Ups",
26 "UnitPrice" : 10.95},
27 "Quantity" : 3.0}]}'
28 from dual connect by level <= 1000000;

1000000 rows created.

SQL> insert into ziggy_json select * from ziggy_json;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

There are a number of ways we can reference and access data from within JSON. One method is use .dot notation to access specific JSON data elements of interest:

SQL> SELECT z.ziggy_order.PONumber FROM ziggy_json z where z.ziggy_order.PONumber=42;

PONUMBER
--------------------------------------------------------------------------------

42
42

We can also use the JSON_VALUE function to access specific JSON data of interest:

SQL> select json_value(ziggy_order, '$.Reference') from ziggy_json  
     where json_value(ziggy_order, '$.PONumber' returning number)=42;

JSON_VALUE(ZIGGY_ORDER,'$.REFERENCE')
--------------------------------------------------------------------------------

DBOWIE-201642
DBOWIE-201642

Without an index, the CBO has no choice but to use an expensive Full Table Scan:

SQL> SELECT z.ziggy_order.PONumber FROM ziggy_json z where z.ziggy_order.PONumber=42;

Elapsed: 00:00:34.04

Execution Plan
----------------------------------------------------------
Plan hash value: 1413303849

--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows  | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |            | 20000 |   28M |    129K (3) | 00:00:06 |
|* 1 | TABLE ACCESS FULL | ZIGGY_JSON | 20000 |   28M |    129K (3) | 00:00:06 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(TO_NUMBER(JSON_QUERY("Z"."ZIGGY_ORDER" FORMAT JSON ,
'$.PONumber' RETURNING VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL
ON ERROR))=42)

Statistics
----------------------------------------------------------
     0 recursive calls
     0 db block gets
500057 consistent gets
299745 physical reads
     0 redo size
   596 bytes sent via SQL*Net to client
   552 bytes received via SQL*Net from client
     2 SQL*Net roundtrips to/from client
     0 sorts (memory)
     0 sorts (disk)
     2 rows processed

 

At some 500,057 consistent gets and an elapsed time of 34.04 seconds, the above query is “slow” and expensive.

So one option to speed things up is to create a function-based index using the JSON_VALUE function. This can then be used to quickly access data that matches specific JSON name values of interest:

SQL> CREATE INDEX ziggy_po_num_idx ON 
     ziggy_json (json_value(ziggy_order, '$.PONumber' RETURNING NUMBER ERROR ON ERROR));

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

As it’s a function-based index, collecting statistics on the implicitly created virtual column is advisable. If we now re-run the query:

SQL> select * from ziggy_json 
     where json_value(ziggy_order, '$.PONumber' returning number)=42;

ID ZIGGY_DAT
---------- ---------
ZIGGY_ORDER
------------------------------------------------------------
42 24-JUN-16
{"PONumber" : 42,
"Reference" : "DBOWIE-201642",
42 24-JUN-16
{"PONumber" : 42,
"Reference" : "DBOWIE-201642",

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1939019025

--------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name             | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                  |    2 |  3058 |       5 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON       |    2 |  3058 |       5 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX |    2 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING NUMBER ERROR ON
ERROR)=42)
Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   9 consistent gets
   0 physical reads
   0 redo size
1863 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

The query now uses the JSON function-based index, performs just 9 consistent gets and completes in 0.01 second.

The index can be used for any data accesses in which the CBO considers the index the cheaper alternative:

SQL> select * from ziggy_json where json_value(ziggy_order, '$.PONumber' returning number)<42;

82 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1939019025

--------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name             | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                  |   82 |  122K |      86 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON       |   82 |  122K |      86 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX |   82 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING NUMBER ERROR ON
ERROR)<42)

Statistics
----------------------------------------------------------
    0 recursive calls
    0 db block gets
  249 consistent gets
    0 physical reads
    0 redo size
50623 bytes sent via SQL*Net to client
22528 bytes received via SQL*Net from client
  166 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
   82 rows processed

Indexes can also be created based on the .dot JSON notation:

SQL> CREATE INDEX ziggy_po_num_idx2 ON ziggy_json z (z.ziggy_order.PONumber);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

A query based on the associated JSON .dot notation can now run efficiently via the index:

SQL> SELECT * FROM ziggy_json z where z.ziggy_order.PONumber='42';

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 4224387816

---------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name              | Rows | Bytes| Cost (%CPU) | Time     |
---------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                   |    2 | 3062 |       6 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON        |    2 | 3062 |       6 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX2 |    2 |      |       3 (0) | 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_QUERY("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING VARCHAR2(4000) ASIS
WITHOUT ARRAY WRAPPER NULL ON ERROR)='42')

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   9 consistent gets
   0 physical reads
   0 redo size
1863 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

Query again uses the index and is just as efficient with almost immediate response times at just 9 consistent gets.

JSON indexes can also be used to police and ensure data constraints and integrity (which can be problematic with JSON documents). The following numeric index example also implicitly adds data constraint capabilities:

SQL> CREATE INDEX ziggy_po_num_idx3 ON ziggy_json z (to_number(z.ziggy_order.PONumber));

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

SQL> insert into ziggy_json
2 select
3 rownum,
4 SYSdate,
5 '{"PONumber" : "200000A",
6 "Reference" : "DBOWIE-2016' || rownum || '",
7 "Requestor" : "David Bowie",
8 "User" : "DBOWIE",
9 "CostCenter" : "A42",
10 "ShippingInstructions" : {"name" : "David Bowie",
11 "Address": {"street" : "42 Ziggy Street",
12 "city" : "Canberra",
13 "state" : "ACT",
14 "zipCode" : 2601,
15 "country" : "Australia"},
16 "Phone" : [{"type" : "Office", "number" :"417-555-7777"},
17 {"type" : "Mobile", "number" :"417-555-1234"}]},
18 "Special Instructions" : null,
19 "AllowPartialShipment" : true,
20 "LineItems" : [{"ItemNumber" : 1,
21 "Part" : {"Description" : "Hunky Dory",
22 "UnitPrice" : 10.95},
23 "Quantity" : 5.0},
24 {"ItemNumber" : 2,
25 "Part" : {"Description" : "Pin-Ups",
26 "UnitPrice" : 10.95},
27 "Quantity" : 3.0}]}'
28 from dual;
insert into ziggy_json
*
ERROR at line 1:
ORA-01722: invalid number

The PONumber value has to now be numeric for it to be successfully added to the JSON document store. The index of course can also be used for efficient data access:

SQL> SELECT * FROM ziggy_json z where to_number(z.ziggy_order.PONumber)=42;

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 692052820

---------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name              | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                   |    2 |  3048 |      5 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON        |    2 |  3048 |      5 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX3 |    2 |       |      3 (0) | 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(TO_NUMBER(JSON_QUERY("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING
VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL ON ERROR))=42)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   9 consistent gets
   0 physical reads
   0 redo size
1863 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

In Part II, I’ll look at how to create composite JSON indexes and how to use a text index to automatically index all name fields within a JSON document.