jump to navigation

12c Indexing Extended Data Types Part I (A Big Hurt) September 12, 2013

Posted by Richard Foote in 12c, Extended Data Types, Function Based Indexes, Oracle Indexes, Unique Indexes.
12 comments

The maximum size for VARCHAR2, NVARCHAR and RAW columns has been extended to 32767 bytes with the Oracle 12c Database. However, indexing such columns with standard indexes comes with some challenges.

These extended data types are not enabled by default within the database but can easily be done so by following these steps:

  1. Restart the database in UPGRADE mode
  2. Change the setting of MAX_STRING_SIZE to EXTENDED
  3. Run the rdbms/admin/utl32k.sql script as sysdba
  4. Restart the database

We can now create a table with a larger than 4000 byte VARCHAR2 column (Note such larger column values are actually stored out of line from the rest of the table, I might discuss this in another post) :

SQL> create table bowie (id number, text varchar2(32000));
Table created.

However, if we try now to create an index on such a column:

SQL> create index bowie_text_i on bowie(text);
create index bowie_text_i on bowie(text)
 *
ERROR at line 1:
ORA-01450: maximum key length (6398) exceeded

We find Oracle complains that the possible index length is going to be too large for my (8K) block sized index. So, is it possible to index such extended columns ?

Let’s populate this table with some data:

SQL> insert into bowie (id, text) values (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
1 row created.

SQL> commit;

Commit complete.

SQL> select length(text) from bowie;

LENGTH(TEXT)
------------
        1110

SQL> insert into bowie (id, text)
     select 2, text||text||text||text||text||text||text||text||text||text
     from bowie;

1 row created.

SQL> commit;

Commit complete.

SQL> select length(text) from bowie;

LENGTH(TEXT)
------------
        1110
       11100

SQL> insert into bowie (id, text)
     select rownum+2, to_char(rownum)||'BOWIE'
     from dual connect by level<=99998;

99998 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

So yes, we definitely have at least one very large Text value (some 11100 bytes) in our table. How cool. One method of creating a valid index on this extended column is to use a function-based index based on a hash value of this column. For example:

SQL> create index bowie_hash_text_i on bowie(standard_hash(text));

Index created.

SQL> select index_name, num_rows, leaf_blocks from dba_indexes where index_name = 'BOWIE_HASH_TEXT_I';

INDEX_NAME             NUM_ROWS LEAF_BLOCKS
-------------------- ---------- -----------
BOWIE_HASH_TEXT_I        100000         447

This index can now be used effectively for subsequent equality based predicates, for example:

SQL> select * from bowie where text = '42BOWIE';

Execution Plan
----------------------------------------------------------

Plan hash value: 1900956348
---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |     1 |    16|   203   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE             |     1 |    16|   203   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_HASH_TEXT_I |   400 |      |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(INTERNAL_FUNCTION("TEXT") AND "TEXT"='42BOWIE')
2 - access(STANDARD_HASH("TEXT")=HEXTORAW('A2C98939EDB479BC3EB0CDC560DDCD1575D47F62'))

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
4  consistent gets
0  physical reads
0  redo size
610  bytes sent via SQL*Net to client
544  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)

1  rows processed

So the index has been used to very efficiently retrieve data based on an equality predicate on the extended TEXT column.

However, range based predicates are problematic as Oracle has no easy way to find and retrieve all such data via the index when the data in the index is effectively randomised hashed values. For example:

SQL> select * from bowie where text like 'aaaaaaaaaaaaaaaaaaaaaa%';

Execution Plan
----------------------------------------------------------

Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |    16 |   208   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     1 |    16 |   208   (2)| 00:00:01 |
---------------------------------------------------------------------------

SQL> select * from bowie where text between '4299BOWIE' and '42BOWIE';

Execution Plan
----------------------------------------------------------

Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     2 |    32 |   208   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     2 |    32 |   208   (2)| 00:00:01 |
---------------------------------------------------------------------------

SQL> select * from bowie where text > 'zzz';

Execution Plan
----------------------------------------------------------

Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |    17 |   219   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     1 |    17 |   219   (2)| 00:00:01 |
---------------------------------------------------------------------------

The above are all examples of predicates that can’t use our hash based function-based index, even though the CBO is estimating very few rows to be returned.

If we try now to make this extended column unique via a constraint:

SQL> alter table bowie add constraint bowie_text_unq unique (text);
alter table bowie add constraint bowie_text_unq unique (text)
*
ERROR at line 1:

ORA-01450: maximum key length (6398) exceeded

We hit our problem again. Oracle tries to make a unique index on the Text column, but it can’t because the extended column definition could potentially exceed the maximum allowable key length.

We can get around this in a similar fashion, but by adding a virtual hash column to the table and basing the Unique constraint on this column instead:

SQL> drop index bowie_hash_text_i;

Index dropped.

SQL> alter table bowie add (text_hash as (standard_hash(text)));

Table altered.

SQL> alter table bowie add constraint bowie_text_unq unique (text_hash);

Table altered.

This can now be used to effectively protect the uniqueness of the original Text column:

SQL> insert into bowie (id, text) values (1000001, '42BOWIE');
insert into bowie (id, text) values (1000001, '42BOWIE')
*
ERROR at line 1:

ORA-00001: unique constraint (BOWIE.BOWIE_TEXT_UNQ) violated

This index can now be used in a similar manner as before for equality based predicates:

SQL> select * from bowie where text = '42BOWIE';

ID  TEXT       TEXT_HASH
--- ---------- ----------------------------------------
44     42BOWIE A2C98939EDB479BC3EB0CDC560DDCD1575D47F62

Execution Plan
----------------------------------------------------------

Plan hash value: 2691947611
----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |     1 |    16 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| BOWIE          |     1 |    16 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BOWIE_TEXT_UNQ |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(INTERNAL_FUNCTION("TEXT") AND "TEXT"='42BOWIE')
2 - access("BOWIE"."TEXT_HASH"=HEXTORAW('A2C98939EDB479BC3EB0CDC560DDCD1575D47F62'))

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
3  consistent gets
0  physical reads
0  redo size
702  bytes sent via SQL*Net to client
544  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

But with the same restrictions with range based predicates:

SQL> select * from bowie where text between '429BOWIE' and '42BOWIE';

ID  TEXT       TEXT_HASH
--- ---------- ----------------------------------------
44     42BOWIE A2C98939EDB479BC3EB0CDC560DDCD1575D47F62
431   429BOWIE A7E2B59E1429DB4964225E7A98A19998BC3D2AFD

Execution Plan
----------------------------------------------------------

Plan hash value: 1845943507
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     2 |    32 |   208   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     2 |    32 |   208   (2)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(INTERNAL_FUNCTION("TEXT") AND "TEXT"<='42BOWIE' AND "TEXT">='429BOWIE')

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
758  consistent gets
0  physical reads
0  redo size
787  bytes sent via SQL*Net to client
544  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2  rows processed

I’ll look at other indexing options with these new extended columns in Part II.