jump to navigation

Oracle 12c: Indexing JSON In The Database Part I (Lazarus) August 2, 2016

Posted by Richard Foote in 12c, JSON, Oracle Indexes.
trackback

One of the very cool new features introduced in Oracle Database 12c Rel 1 is the ability to store JavaScript Object Notation (JSON) documents within the database. Unlike XML which has its own data type, JSON data can be stored as VARCHAR2, CLOB or BLOB data types, but with a JSON check constraint to ensure the stored parsed document meets JSON document standards.

This enables ‘No-SQL’ schema-less type development within the Oracle Database  for “next generation” applications. Although developers can work with the JSON store without using SQL (via say RESTful APIs directly within the database), you can still leverage the power of SQL for reporting and analytics type purposes. Of course, reading JSON data efficiently then becomes important, and that’s where indexing the JSON document store kicks in.

Let’s look at a simple example.

Firstly, let’s create a table with a column called ZIGGY_ORDER which stores JSON documents.

SQL> CREATE TABLE ziggy_json  (id number,  ziggy_date date,  ziggy_order CLOB  CONSTRAINT ensure_ziggy_json CHECK (ziggy_order IS JSON));</pre>
Table created.

The ZIGGY_ORDER column has a JSON check constraint which ensures only valid JSON documents can be stored. An attempt to insert a row with invalid JSON data will fail:

SQL> insert into ziggy_json values (1, sysdate, '{"This is not legal JSON"}');
insert into ziggy_json values (1, sysdate, '{"This is not legal JSON"}')
*
ERROR at line 1:
ORA-02290: check constraint (BOWIE.ENSURE_ZIGGY_JSON) violated

Let’s insert some JSON data into the table. Note I’m using ROWNUM to insert some almost unique JSON data and then re-inserting the data again to get data worth accessing via an index:

SQL> insert into ziggy_json
2 select
3 rownum,
4 SYSdate,
5 '{"PONumber" : ' || rownum || ',
6 "Reference" : "DBOWIE-2016' || rownum || '",
7 "Requestor" : "David Bowie",
8 "User" : "DBOWIE",
9 "CostCenter" : "A42",
10 "ShippingInstructions" : {"name" : "David Bowie",
11 "Address": {"street" : "42 Ziggy Street",
12 "city" : "Canberra",
13 "state" : "ACT",
14 "zipCode" : 2601,
15 "country" : "Australia"},
16 "Phone" : [{"type" : "Office", "number" : "417-555-7777"},
17 {"type" : "Mobile", "number" : "417-555-1234"}]},
18 "Special Instructions" : null,
19 "AllowPartialShipment" : true,
20 "LineItems" : [{"ItemNumber" : 1,
21 "Part" : {"Description" : "Hunky Dory",
22 "UnitPrice" : 10.95},
23 "Quantity" : 5.0},
24 {"ItemNumber" : 2,
25 "Part" : {"Description" : "Pin-Ups",
26 "UnitPrice" : 10.95},
27 "Quantity" : 3.0}]}'
28 from dual connect by level <= 1000000;

1000000 rows created.

SQL> insert into ziggy_json select * from ziggy_json;

1000000 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

There are a number of ways we can reference and access data from within JSON. One method is use .dot notation to access specific JSON data elements of interest:

SQL> SELECT z.ziggy_order.PONumber FROM ziggy_json z where z.ziggy_order.PONumber=42;

PONUMBER
--------------------------------------------------------------------------------

42
42

We can also use the JSON_VALUE function to access specific JSON data of interest:

SQL> select json_value(ziggy_order, '$.Reference') from ziggy_json  
     where json_value(ziggy_order, '$.PONumber' returning number)=42;

JSON_VALUE(ZIGGY_ORDER,'$.REFERENCE')
--------------------------------------------------------------------------------

DBOWIE-201642
DBOWIE-201642

Without an index, the CBO has no choice but to use an expensive Full Table Scan:

SQL> SELECT z.ziggy_order.PONumber FROM ziggy_json z where z.ziggy_order.PONumber=42;

Elapsed: 00:00:34.04

Execution Plan
----------------------------------------------------------
Plan hash value: 1413303849

--------------------------------------------------------------------------------
| Id | Operation         | Name       | Rows  | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------
| 0  | SELECT STATEMENT  |            | 20000 |   28M |    129K (3) | 00:00:06 |
|* 1 | TABLE ACCESS FULL | ZIGGY_JSON | 20000 |   28M |    129K (3) | 00:00:06 |
--------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(TO_NUMBER(JSON_QUERY("Z"."ZIGGY_ORDER" FORMAT JSON ,
'$.PONumber' RETURNING VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL
ON ERROR))=42)

Statistics
----------------------------------------------------------
     0 recursive calls
     0 db block gets
500057 consistent gets
299745 physical reads
     0 redo size
   596 bytes sent via SQL*Net to client
   552 bytes received via SQL*Net from client
     2 SQL*Net roundtrips to/from client
     0 sorts (memory)
     0 sorts (disk)
     2 rows processed

 

At some 500,057 consistent gets and an elapsed time of 34.04 seconds, the above query is “slow” and expensive.

So one option to speed things up is to create a function-based index using the JSON_VALUE function. This can then be used to quickly access data that matches specific JSON name values of interest:

SQL> CREATE INDEX ziggy_po_num_idx ON 
     ziggy_json (json_value(ziggy_order, '$.PONumber' RETURNING NUMBER ERROR ON ERROR));

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

As it’s a function-based index, collecting statistics on the implicitly created virtual column is advisable. If we now re-run the query:

SQL> select * from ziggy_json 
     where json_value(ziggy_order, '$.PONumber' returning number)=42;

ID ZIGGY_DAT
---------- ---------
ZIGGY_ORDER
------------------------------------------------------------
42 24-JUN-16
{"PONumber" : 42,
"Reference" : "DBOWIE-201642",
42 24-JUN-16
{"PONumber" : 42,
"Reference" : "DBOWIE-201642",

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1939019025

--------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name             | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                  |    2 |  3058 |       5 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON       |    2 |  3058 |       5 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX |    2 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING NUMBER ERROR ON
ERROR)=42)
Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   9 consistent gets
   0 physical reads
   0 redo size
1863 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

The query now uses the JSON function-based index, performs just 9 consistent gets and completes in 0.01 second.

The index can be used for any data accesses in which the CBO considers the index the cheaper alternative:

SQL> select * from ziggy_json where json_value(ziggy_order, '$.PONumber' returning number)<42;

82 rows selected.

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 1939019025

--------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name             | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                  |   82 |  122K |      86 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON       |   82 |  122K |      86 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX |   82 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING NUMBER ERROR ON
ERROR)<42)

Statistics
----------------------------------------------------------
    0 recursive calls
    0 db block gets
  249 consistent gets
    0 physical reads
    0 redo size
50623 bytes sent via SQL*Net to client
22528 bytes received via SQL*Net from client
  166 SQL*Net roundtrips to/from client
    0 sorts (memory)
    0 sorts (disk)
   82 rows processed

Indexes can also be created based on the .dot JSON notation:

SQL> CREATE INDEX ziggy_po_num_idx2 ON ziggy_json z (z.ziggy_order.PONumber);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

A query based on the associated JSON .dot notation can now run efficiently via the index:

SQL> SELECT * FROM ziggy_json z where z.ziggy_order.PONumber='42';

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 4224387816

---------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name              | Rows | Bytes| Cost (%CPU) | Time     |
---------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                   |    2 | 3062 |       6 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON        |    2 | 3062 |       6 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX2 |    2 |      |       3 (0) | 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_QUERY("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING VARCHAR2(4000) ASIS
WITHOUT ARRAY WRAPPER NULL ON ERROR)='42')

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   9 consistent gets
   0 physical reads
   0 redo size
1863 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

Query again uses the index and is just as efficient with almost immediate response times at just 9 consistent gets.

JSON indexes can also be used to police and ensure data constraints and integrity (which can be problematic with JSON documents). The following numeric index example also implicitly adds data constraint capabilities:

SQL> CREATE INDEX ziggy_po_num_idx3 ON ziggy_json z (to_number(z.ziggy_order.PONumber));

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

SQL> insert into ziggy_json
2 select
3 rownum,
4 SYSdate,
5 '{"PONumber" : "200000A",
6 "Reference" : "DBOWIE-2016' || rownum || '",
7 "Requestor" : "David Bowie",
8 "User" : "DBOWIE",
9 "CostCenter" : "A42",
10 "ShippingInstructions" : {"name" : "David Bowie",
11 "Address": {"street" : "42 Ziggy Street",
12 "city" : "Canberra",
13 "state" : "ACT",
14 "zipCode" : 2601,
15 "country" : "Australia"},
16 "Phone" : [{"type" : "Office", "number" :"417-555-7777"},
17 {"type" : "Mobile", "number" :"417-555-1234"}]},
18 "Special Instructions" : null,
19 "AllowPartialShipment" : true,
20 "LineItems" : [{"ItemNumber" : 1,
21 "Part" : {"Description" : "Hunky Dory",
22 "UnitPrice" : 10.95},
23 "Quantity" : 5.0},
24 {"ItemNumber" : 2,
25 "Part" : {"Description" : "Pin-Ups",
26 "UnitPrice" : 10.95},
27 "Quantity" : 3.0}]}'
28 from dual;
insert into ziggy_json
*
ERROR at line 1:
ORA-01722: invalid number

The PONumber value has to now be numeric for it to be successfully added to the JSON document store. The index of course can also be used for efficient data access:

SQL> SELECT * FROM ziggy_json z where to_number(z.ziggy_order.PONumber)=42;

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 692052820

---------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name              | Rows | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                   |    2 |  3048 |      5 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON        |    2 |  3048 |      5 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_PO_NUM_IDX3 |    2 |       |      3 (0) | 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(TO_NUMBER(JSON_QUERY("ZIGGY_ORDER" FORMAT JSON , '$.PONumber' RETURNING
VARCHAR2(4000) ASIS WITHOUT ARRAY WRAPPER NULL ON ERROR))=42)

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   9 consistent gets
   0 physical reads
   0 redo size
1863 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

In Part II, I’ll look at how to create composite JSON indexes and how to use a text index to automatically index all name fields within a JSON document.

Comments»

No comments yet — be the first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: