12c Indexing Extended Data Types Part I (A Big Hurt)

12c Indexing Extended Data Types Part I (A Big Hurt) September 12, 2013

Posted by Richard Foote in 12c, Extended Data Types, Function Based Indexes, Oracle Indexes, Unique Indexes.
trackback

The maximum size for VARCHAR2, NVARCHAR and RAW columns has been extended to 32767 bytes with the Oracle 12c Database. However, indexing such columns with standard indexes comes with some challenges.

These extended data types are not enabled by default within the database but can easily be done so by following these steps:

Restart the database in UPGRADE mode
Change the setting of MAX_STRING_SIZE to EXTENDED
Run the rdbms/admin/utl32k.sql script as sysdba
Restart the database

We can now create a table with a larger than 4000 byte VARCHAR2 column (Note such larger column values are actually stored out of line from the rest of the table, I might discuss this in another post) :

SQL> create table bowie (id number, text varchar2(32000));
Table created.

However, if we try now to create an index on such a column:

SQL> create index bowie_text_i on bowie(text);
create index bowie_text_i on bowie(text)
 *
ERROR at line 1:
ORA-01450: maximum key length (6398) exceeded

We find Oracle complains that the possible index length is going to be too large for my (8K) block sized index. So, is it possible to index such extended columns ?

Let’s populate this table with some data:

SQL> insert into bowie (id, text) values (1, 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa');
1 row created.

SQL> commit;

Commit complete.

SQL> select length(text) from bowie;

LENGTH(TEXT)
------------
        1110

SQL> insert into bowie (id, text)
     select 2, text||text||text||text||text||text||text||text||text||text
     from bowie;

1 row created.

SQL> commit;

Commit complete.

SQL> select length(text) from bowie;

LENGTH(TEXT)
------------
        1110
       11100

SQL> insert into bowie (id, text)
     select rownum+2, to_char(rownum)||'BOWIE'
     from dual connect by level<=99998;

99998 rows created.

SQL> commit;

Commit complete.

SQL> exec dbms_stats.gather_table_stats(ownname=>user, tabname=>'BOWIE', method_opt=>'FOR ALL COLUMNS SIZE 1');

PL/SQL procedure successfully completed.

So yes, we definitely have at least one very large Text value (some 11100 bytes) in our table. How cool. One method of creating a valid index on this extended column is to use a function-based index based on a hash value of this column. For example:

SQL> create index bowie_hash_text_i on bowie(standard_hash(text));

Index created.

SQL> select index_name, num_rows, leaf_blocks from dba_indexes where index_name = 'BOWIE_HASH_TEXT_I';

INDEX_NAME             NUM_ROWS LEAF_BLOCKS
-------------------- ---------- -----------
BOWIE_HASH_TEXT_I        100000         447

This index can now be used effectively for subsequent equality based predicates, for example:

SQL> select * from bowie where text = '42BOWIE';

Execution Plan
----------------------------------------------------------

Plan hash value: 1900956348
---------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name              | Rows  | Bytes| Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                   |     1 |    16|   203   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BOWIE             |     1 |    16|   203   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | BOWIE_HASH_TEXT_I |   400 |      |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(INTERNAL_FUNCTION("TEXT") AND "TEXT"='42BOWIE')
2 - access(STANDARD_HASH("TEXT")=HEXTORAW('A2C98939EDB479BC3EB0CDC560DDCD1575D47F62'))

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
4  consistent gets
0  physical reads
0  redo size
610  bytes sent via SQL*Net to client
544  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)

1  rows processed

So the index has been used to very efficiently retrieve data based on an equality predicate on the extended TEXT column.

However, range based predicates are problematic as Oracle has no easy way to find and retrieve all such data via the index when the data in the index is effectively randomised hashed values. For example:

SQL> select * from bowie where text like 'aaaaaaaaaaaaaaaaaaaaaa%';

Execution Plan
----------------------------------------------------------

Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |    16 |   208   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     1 |    16 |   208   (2)| 00:00:01 |
---------------------------------------------------------------------------

SQL> select * from bowie where text between '4299BOWIE' and '42BOWIE';

Execution Plan
----------------------------------------------------------

Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     2 |    32 |   208   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     2 |    32 |   208   (2)| 00:00:01 |
---------------------------------------------------------------------------

SQL> select * from bowie where text > 'zzz';

Execution Plan
----------------------------------------------------------

Plan hash value: 1845943507

---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     1 |    17 |   219   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     1 |    17 |   219   (2)| 00:00:01 |
---------------------------------------------------------------------------

The above are all examples of predicates that can’t use our hash based function-based index, even though the CBO is estimating very few rows to be returned.

If we try now to make this extended column unique via a constraint:

SQL> alter table bowie add constraint bowie_text_unq unique (text);
alter table bowie add constraint bowie_text_unq unique (text)
*
ERROR at line 1:

ORA-01450: maximum key length (6398) exceeded

We hit our problem again. Oracle tries to make a unique index on the Text column, but it can’t because the extended column definition could potentially exceed the maximum allowable key length.

We can get around this in a similar fashion, but by adding a virtual hash column to the table and basing the Unique constraint on this column instead:

SQL> drop index bowie_hash_text_i;

Index dropped.

SQL> alter table bowie add (text_hash as (standard_hash(text)));

Table altered.

SQL> alter table bowie add constraint bowie_text_unq unique (text_hash);

Table altered.

This can now be used to effectively protect the uniqueness of the original Text column:

SQL> insert into bowie (id, text) values (1000001, '42BOWIE');
insert into bowie (id, text) values (1000001, '42BOWIE')
*
ERROR at line 1:

ORA-00001: unique constraint (BOWIE.BOWIE_TEXT_UNQ) violated

This index can now be used in a similar manner as before for equality based predicates:

SQL> select * from bowie where text = '42BOWIE';

ID  TEXT       TEXT_HASH
--- ---------- ----------------------------------------
44     42BOWIE A2C98939EDB479BC3EB0CDC560DDCD1575D47F62

Execution Plan
----------------------------------------------------------

Plan hash value: 2691947611
----------------------------------------------------------------------------------------------
| Id  | Operation                   | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                |     1 |    16 |     2   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS BY INDEX ROWID| BOWIE          |     1 |    16 |     2   (0)| 00:00:01 |
|*  2 |   INDEX UNIQUE SCAN         | BOWIE_TEXT_UNQ |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter(INTERNAL_FUNCTION("TEXT") AND "TEXT"='42BOWIE')
2 - access("BOWIE"."TEXT_HASH"=HEXTORAW('A2C98939EDB479BC3EB0CDC560DDCD1575D47F62'))

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
3  consistent gets
0  physical reads
0  redo size
702  bytes sent via SQL*Net to client
544  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
1  rows processed

But with the same restrictions with range based predicates:

SQL> select * from bowie where text between '429BOWIE' and '42BOWIE';

ID  TEXT       TEXT_HASH
--- ---------- ----------------------------------------
44     42BOWIE A2C98939EDB479BC3EB0CDC560DDCD1575D47F62
431   429BOWIE A7E2B59E1429DB4964225E7A98A19998BC3D2AFD

Execution Plan
----------------------------------------------------------

Plan hash value: 1845943507
---------------------------------------------------------------------------
| Id  | Operation         | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |       |     2 |    32 |   208   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| BOWIE |     2 |    32 |   208   (2)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(INTERNAL_FUNCTION("TEXT") AND "TEXT"<='42BOWIE' AND "TEXT">='429BOWIE')

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
758  consistent gets
0  physical reads
0  redo size
787  bytes sent via SQL*Net to client
544  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2  rows processed

I’ll look at other indexing options with these new extended columns in Part II.

Comments»

1. Colin 't Hart - September 13, 2013: I really hope that people don’t expect to be able to use unique constraints on such “long” columns. Functionally these should be used for comment and description type fields, not for permitting someone to have a street address which is 4001 characters long.

As far as range searches go, an index on substr(column, 1, n) for a suitable value of n would allow one to search on the first n characters of these columns, a useful compromise I think.

The other thing to bear in mind is that large indexes are slower 🙂

LikeLike

Reply
Richard Foote - September 13, 2013: Hi Colin

Yes, I agree that requiring a unique constraint on an extended columns would be a tad unusual 🙂

I’ll discuss the use of substr functions in Part II.

LikeLike

Reply
2. Oren Nakdimon - September 13, 2013: Hi Richard.

Regarding the unique constraint on the hash function, there is a small (tiny perhaps) chance that it will prevent inserting two legitimate records with different values that are hashed to the same value, right?

LikeLike

Reply
Richard Foote - September 16, 2013: Hi Oren

I don’t know enough about the various hash algorithms used here (the default is SHA1, but you can use a variety of other algorithms if you’re concerned) but the chance of a collision would have to be classed as being extremely extremely remote.

LikeLike

Reply
Al - September 17, 2013: Indeed, the probability is so “extremely extremely remote” that you’ll have to wait some time before you get some ORA-1 in you nightly batch processing on production… And then you’ll need quite some time before you understand what happens.

I have already seen some file-processing batch ignoring a file because it had the same hash as another file already processed (wasn’t SHA-1, i guess. Maybe md5sum, IIRC)… so I’m not fond of this hash-base-uniqueness idea, although I guess there might be valid use cases for this (which is why I won’t claim it is useless).

If I were be a DBA (I’m a developer), and if a developer sent me code like this, I’d ask him why he believes using a no-so-unique-index is reasonable in his use case… and not let him do it without a good explanation. I do not need hard-to-diagnose ORA-1 and ignored “duplicates” on live systems 🙂

LikeLike

Reply
3. Richard Foote - November 14, 2013: Hi Al

If you’re paranoid about the possibility of a collision with the default SHA-1 hash algorithm which has a theoretical 2 to the 60 chance of a collision, you can simply use the SHA-256 or above supported algorithms which I don’t believe have a reported collision.

LikeLike

Reply
4. 32K Columns | Oracle Scratchpad - November 14, 2013: […] Part 1 […]

LikeLike

Reply
5. Sayan Malakshinov - November 15, 2013: But we can create tablespace with bigger blocksize for such indexes. For example with max blocksize=32K, we can store up to 26510 bytes, so we can create indexes on columns up to varchar2(26510 bytes).

For example:
SQL> create index ix_t_varchar16000 on t_varchar16000(v) tablespace ts_32k;

Index created.

LikeLike

Reply
Richard Foote - November 15, 2013: Hi Sayan

Thanks for that, indeed you could.

But I question whether you would actually want to do this. You would have to create a separate, manually tuned 32K block buffer cache and the index segment would be enormous compared to either hash based or substr alternatives while providing little additional benefit.

LikeLike

Reply
Sayan Malakshinov - November 15, 2013: Hi Richard!
i totally agree about usage of hashes and substrings – it’s time-tested approach, especially for clobs. I just wanted to add this to full options list

LikeLike

Reply
6. From MPhil to PhD; GSTAR update | GSTAR - July 11, 2015: […] as the enterprise level security are quite simply a hindrance rather than a help not to mention indexing. More critically, I found working with the Jena Connector and GeoSPARQL to be fraught with the […]

LikeLike

Reply
7. Error Logging | Oracle Scratchpad - August 24, 2018: […] Error logging is a topic that I’ve mentioned a couple of times in the past, most recently as a follow-up in a discussion of the choices for copying a large volume of data from one table to another, but originally in an addendum about a little surprise you may get when you use extended strings (max_string_size = EXTENDED). […]

LikeLike

Reply

	PP on Costing Concatenated Indexes W…
	Zeeshan Moin on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	PP on The End Of Richard Foote…
	Ram on The Fake Index Trap (“No…
	Richard Foote on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	NormanDunbar on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Richard Foote on The End Of Richard Foote…
	Sayan Malakshinov on The End Of Richard Foote…

Richard Foote's Oracle Blog