jump to navigation

Oracle 12c: Indexing JSON in the Database Part III (Paperback Writer) September 2, 2016

Posted by Richard Foote in 12c, JSON, JSON Text Index, Oracle Indexes.
2 comments

In Part I and Part II, we looked at how to index specific attributes within a JSON document store within an Oracle 12c database.

But what if we’re not sure which specific attributes might benefit from an index or indeed, as JSON is by it’s nature a schema-less way to store data, what if we’re not entirely sure what attributes might be present currently or in the future.

On a JSON document store within the Oracle Database, you can create a special JSON aware Text Index that can automatically index any field/attribute within a JSON document and use a Text based function to then search efficiently for data from any attribute.

Using the same table created in Part I, you can create a JSON Text index as follows:

SQL> CREATE INDEX ziggy_search_idx ON ziggy_json (ziggy_order)
2 INDEXTYPE IS CTXSYS.CONTEXT
3 PARAMETERS ('section group CTXSYS.JSON_SECTION_GROUP SYNC (ON COMMIT)');
Index created.

Note this Text index is (optionally) defined to be automatically synchronised when data in the ZIGGY_JSON table is committed.

We can use the JSON_TEXTCONTAINS Oracle Text function to efficiently access data for any data within the JSON defined column. For example:

SQL> SELECT * FROM ziggy_json WHERE json_textcontains(ziggy_order, '$.Reference', 'DBOWIE-201642');

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 3069169778

------------------------------------------------------------------------------------------------
| Id | Operation                   | Name             | Rows | Bytes | Cost (%CPU) | Time      |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                  |    1 |  1534 |    2550 (0) |  00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY_JSON       |    1 |  1534 |    2550 (0) |  00:00:01 |
|* 2 | DOMAIN INDEX                | ZIGGY_SEARCH_IDX |      |       |    2549 (0) |  00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CTXSYS"."CONTAINS"("ZIGGY_JSON"."ZIGGY_ORDER",'{DBOWIE-201642}
 INPATH(/Reference)')>0)

Statistics
----------------------------------------------------------
  65 recursive calls
   0 db block gets
 118 consistent gets
   0 physical reads
   0 redo size
1863 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

Note the JSON-based Text index is used to retrieve data efficiently.

The Text index can also be used to search data efficiently from within an array set:

SQL> SELECT * FROM ziggy_json WHERE json_textcontains(ziggy_order, '$.LineItems.Part.Description', 'Low');

no rows selected

Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 3069169778

------------------------------------------------------------------------------------------------
| Id | Operation                   | Name             | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                  |    1 |  1534 |    5927 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY_JSON       |    1 |  1534 |    5927 (0) | 00:00:01 |
|* 2 | DOMAIN INDEX                | ZIGGY_SEARCH_IDX |      |       |    5927 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CTXSYS"."CONTAINS"("ZIGGY_JSON"."ZIGGY_ORDER",'{Low}
 INPATH(/LineItems/Part/Description)')>0)

Statistics
----------------------------------------------------------
132 recursive calls
  0 db block gets
182 consistent gets
  0 physical reads
  0 redo size
627 bytes sent via SQL*Net to client
552 bytes received via SQL*Net from client
  2 SQL*Net roundtrips to/from client
  0 sorts (memory)
  0 sorts (disk)
  0 rows processed

We can also search for a specific data value across any attribute within the JSON document store:

SQL> SELECT * FROM ziggy_json WHERE json_textcontains(ziggy_order, '$', '4242');

Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 3069169778

------------------------------------------------------------------------------------------------
| Id | Operation                   | Name             | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                  |    1 |  1534 |       2 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY_JSON       |    1 |  1534 |       2 (0) | 00:00:01 |
|* 2 | DOMAIN INDEX                | ZIGGY_SEARCH_IDX |      |       |       1 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

 2 - access("CTXSYS"."CONTAINS"("ZIGGY_JSON"."ZIGGY_ORDER",'{4242}')>0)
     
Statistics
----------------------------------------------------------
  10 recursive calls
   0 db block gets
  32 consistent gets
   0 physical reads
   0 redo size
1865 bytes sent via SQL*Net to client
1088 bytes received via SQL*Net from client
   6 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   2 rows processed

Let’s now add more data to the JSON column, but this time introducing a few new attributes (such as AlbumName):

SQL> insert into ziggy_json
 2 select
 3 rownum,
 4 SYSdate,
 5 '{"AlbumId" : ' || rownum || ',
 6 "AlbumName" : "HUNKY DORY",
 7 "ArtistName" : "David Bowie"}'
 8 from dual connect by level <= 10;
10 rows created.

SQL> commit;

Commit complete.

As the JSON-based Text index was defined to be automatically synchronised when we commit data in the table, these new attributes can be immediately searched and accessed via the index:

SQL> SELECT * FROM ziggy_json WHERE json_textcontains(ziggy_order, '$.AlbumName', 'HUNKY DORY');

10 rows selected.

Elapsed: 00:00:00.02

Execution Plan
----------------------------------------------------------
Plan hash value: 3069169778

------------------------------------------------------------------------------------------------
| Id | Operation                   | Name             | Rows | Bytes | Cost (%CPU) | Time     |
------------------------------------------------------------------------------------------------
|  0 | SELECT STATEMENT            |                  |  198 |  296K |    1948 (0) | 00:00:01 |
|  1 | TABLE ACCESS BY INDEX ROWID | ZIGGY_JSON       |  198 |  296K |    1948 (0) | 00:00:01 |
|* 2 | DOMAIN INDEX                | ZIGGY_SEARCH_IDX |      |       |    1780 (0) | 00:00:01 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("CTXSYS"."CONTAINS"("ZIGGY_JSON"."ZIGGY_ORDER",'{HUNKY DORY}
 INPATH(/AlbumName)')>0)
     
Statistics
----------------------------------------------------------
  48 recursive calls
   0 db block gets
 103 consistent gets
   0 physical reads
   0 redo size
6751 bytes sent via SQL*Net to client
3232 bytes received via SQL*Net from client
  22 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
  10 rows processed

Not only can JSON data be stored within the Oracle 12c database, but we have a number of index strategies available to search such data efficiently.

Advertisements

Oracle 12c: Indexing JSON in the Database Part II (Find A Little Wood) August 5, 2016

Posted by Richard Foote in 12c, JSON, JSON Text Index, Oracle Indexes.
4 comments

In Part I, we looked at how you can now store JSON documents within the Oracle 12c Database. For efficient accesses to JSON documents stored in the Oracle database, we can either create a function-based index based on the JSON_VALUE function or on JSON .dot notation.

These indexes are useful for indexing specific JSON attributes, but what if we want to index multiple JSON attributes within a single index structure.

To start, I’m just going to add an extra row to increase the selectivity of other columns.

SQL> insert into ziggy_json
2 select
3 100001,
4 SYSdate,
5 '{"PONumber" : 1000001,
6 "Reference" : "MTOM-20161",
7 "Requestor" : "Major Tom",
8 "User" : "MTOM",
9 "CostCenter" : "B42",
10 "ShippingInstructions" : {"name" : "Major Tom",
11 "Address": {"street" : "42 Ziggy Street",
12 "city" : "Canberra",
13 "state" : "ACT",
14 "zipCode" : 2601,
15 "country" : "Australia"},
16 "Phone" : [{"type" : "Office", "number" : "417-555-7777"},
17 {"type" : "Mobile", "number" : "417-555-1234"}]},
18 "Special Instructions" : null,
19 "AllowPartialShipment" : true,
20 "LineItems" : [{"ItemNumber" : 1,
21 "Part" : {"Description" : "Hunky Dory",
22 "UnitPrice" : 10.95},
23 "Quantity" : 5.0},
24 {"ItemNumber" : 2,
25 "Part" : {"Description" : "Pin-Ups",
26 "UnitPrice" : 10.95},
27 "Quantity" : 3.0}]}'
28 from dual;

1 row created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

We can still create composite indexes based on the JSON_VALUE function as we can with conventional columns:

SQL> create index ziggy_json_idx3 on 
     ziggy_json(json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20)), 
                json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6)));

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

If we now run a query with a couple of JSON_VALUE based predicates:

SQL> select * from ziggy_json
     where json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20))='MTOM' and 
           json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6)) = 'B42';

Execution Plan
----------------------------------------------------------
Plan hash value: 3402615542

-------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name            | Rows | Bytes | Cost (%CPU) | Time     |
-------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                 |    1 |  1533 |       4 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON      |    1 |  1533 |       4 (0) | 00:00:01 |
|*2  | INDEX RANGE SCAN                    | ZIGGY_JSON_IDX3 |    1 |       |       3 (0) | 00:00:01 |
-------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access(JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.User' RETURNING VARCHAR2(20) NULL ON
ERROR)='MTOM' AND JSON_VALUE("ZIGGY_ORDER" FORMAT JSON , '$.CostCenter' RETURNING VARCHAR2(6) NULL ON ERROR)='B42')
 
Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   5 consistent gets
   0 physical reads
   0 redo size
1248 bytes sent via SQL*Net to client
 820 bytes received via SQL*Net from client
   4 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

The composite index is effectively used by the CBO as expected.

It does though make our SQL a little cumbersome to write. To simplify things a tad, we could create a couple of virtual columns based on these functions, create the JSON function-based indexes on these virtual columns and simplify the SQL accordingly.

First, we create the virtual columns (note they’re virtual columns and so consume no storage):

 

SQL> ALTER TABLE ziggy_json ADD (userid VARCHAR2(20)
2 GENERATED ALWAYS AS (json_value(ziggy_order, '$.User' RETURNING VARCHAR2(20))));

Table altered.

SQL> ALTER TABLE ziggy_json ADD (costcenter VARCHAR2(6)
2 GENERATED ALWAYS AS (json_value(ziggy_order, '$.CostCenter' RETURNING VARCHAR2(6))));

Table altered.

 

Next, create the index based on these newly created virtual columns:

SQL> CREATE INDEX ziggy_user_costctr_idx on ziggy_json(userid, costcenter);

Index created.

SQL> exec dbms_stats.gather_table_stats(ownname=>null, tabname=>'ZIGGY_JSON');

PL/SQL procedure successfully completed.

And then write a simplified version of the SQL to reference the virtual columns:

SQL> select * from ziggy_json where userid='MTOM' and costcenter='B42';

ID ZIGGY_DAT
---------- ---------
ZIGGY_ORDER
--------------------------------------------------------------------------------

USERID COSTCE
-------------------- ------
100001 24-JUN-16
{"PONumber" : 1000001,
"Reference" : "MTOM-20161",
MTOM B42
  
Elapsed: 00:00:00.01

Execution Plan
----------------------------------------------------------
Plan hash value: 5717455

--------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                   | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                        |    1 |  1535 |       4 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON             |    1 |  1535 |       4 (0) | 00:00:01 |
|* 2 | INDEX RANGE SCAN                    | ZIGGY_USER_COSTCTR_IDX |    1 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("USERID"='MTOM' AND "COSTCENTER"='B42')
   
Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   5 consistent gets
   0 physical reads
   0 redo size
1396 bytes sent via SQL*Net to client
 820 bytes received via SQL*Net from client
   4 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

The index is again used as expected.

Of course, if we still want to reference the JSON functions directly within the SQL, the query can still be written as previously:

SQL> select * from ziggy_json 
     where json_value(ziggy_order, '$.User' returning varchar2(20))='MTOM' and
           json_value(ziggy_order, '$.CostCenter' returning varchar2(6))='B42';

ID ZIGGY_DAT
---------- ---------
ZIGGY_ORDER
--------------------------------------------------------------------------------

USERID COSTCE
-------------------- ------
100001 24-JUN-16
{"PONumber" : 1000001,
"Reference" : "MTOM-20161",
MTOM B42
   
Elapsed: 00:00:00.00

Execution Plan
----------------------------------------------------------
Plan hash value: 5717455

--------------------------------------------------------------------------------------------------------------
| Id | Operation                           | Name                   | Rows | Bytes | Cost (%CPU) | Time     |
--------------------------------------------------------------------------------------------------------------
| 0  | SELECT STATEMENT                    |                        |    1 |  1535 |       4 (0) | 00:00:01 |
| 1  | TABLE ACCESS BY INDEX ROWID BATCHED | ZIGGY_JSON             |    1 |  1535 |       4 (0) | 00:00:01 |
|*2  | INDEX RANGE SCAN                    | ZIGGY_USER_COSTCTR_IDX |    1 |       |       3 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("ZIGGY_JSON"."USERID"='MTOM' AND "ZIGGY_JSON"."COSTCENTER"='B42')

Statistics
----------------------------------------------------------
   0 recursive calls
   0 db block gets
   5 consistent gets
   0 physical reads
   0 redo size
1396 bytes sent via SQL*Net to client
 820 bytes received via SQL*Net from client
   4 SQL*Net roundtrips to/from client
   0 sorts (memory)
   0 sorts (disk)
   1 rows processed

The index is again used as expected.

I’ll next look at using a JSON Text based index to effectively index the entire JSON document.