jump to navigation

Introduction To Linguistic Indexes – Part II January 9, 2008

Posted by Richard Foote in Indexing Tricks, Linguistic Indexes, Oracle Cost Based Optimizer, Oracle Indexes, Performance Tuning.
6 comments

As previously discussed, Linguistic Indexes can potentially be useful with case-insensitive searches and sorts.

However, they have a number of issues and disadvantages.

The first issue is that once the NLS_COMP parameter is set to ‘LINGUISTIC’ and the NLS_SORT parameter is set to something other than ‘BINARY’, standard binary indexes can no longer be used and are ignored by the CBO. This means one needs to have a very careful and consistent indexing strategy to ensure no SQL statements are compromised while Linguistic related NLS parameters are set. Simple demo highlighting issues with mixing Linguistic and Binary Indexes here.  Note these demos follow those in Introduction To Linguistic Indexes Part I.

The next issue is that Linguistic Indexes are ignored for some types of predicate conditions. MIN, MAX and LIKE can not be used with Linguistic Indexes (although LIKE can now be used with 11g). Simple demo highlighting problems with these predicate conditions here.

Finally, Linguistic Indexes typically use more storage than Binary indexes and so have more associated overheads and costs. The differences in storage is dependent on the charactersets associated with the various indexes. Some examples of differences shown here. Warning: This demo has lots of block dumps !!

Linguistic Indexes are worthy of consideration, but so are the associated costs and disadvantages.

Introduction To Linguistic Indexes – Part I January 3, 2008

Posted by Richard Foote in Index Access Path, Indexing Tricks, Linguistic Indexes, Oracle Indexes.
9 comments

Characters are sorted by default based on numeric values defined by the default character encoding scheme (known as Binary Sorting). For us Australians, this is fine as we (generally) speak English and the English alphabet is nicely sorted in ascending order by ASCII and EBCDIC standards. However, many other languages are not so fortunate as the binary sort does not sort the data in many language’s alphabetic sort order.  Oracle has many Globalization Support features to help users in other languages get over these issues (all very interesting and topics for many a Blog entry in the future).

However, even us Australians have issues when it comes to “case-insensitive” searches, where data may be stored in many different cases (eg. Ziggy, ZIGGY, ZiGgY, etc.) and we want to return all data that matches a character value, regardless of its case.

The issue of course is that by default, all text searches are case-sensitive. For example a search WHERE name=’ZIGGY’ will only return ‘ZIGGY’ but not ‘Ziggy’ or ‘ZiGgY’ etc.

The standard fix is for the application to convert the data to a consistent case when performing the search. For example a search WHERE UPPER(Name) = ‘ZIGGY’ will return all values of “ZIGGY” regardless of their case but this will negate the use of any standard index on the Name column.

Therefore, a Function-Based index is required, say based on UPPER(Name), to ensure an efficient index access is possible for case insensitive searches.

However, this often requires an additional index to be created and for the application to be explicitly written to make use of the function-based index defined function.

Now the best cure for this problem is simply to ensure all data is stored in a consistent case (ZIGGY, ZIGGY, ZIGGY) but this may not always be practical or even desirable in some cases.

Another possible solution is the use of a Linguistic Index. This is an index that is created based on a specific case insensitive linguistic language or multilingual option that ensures the index entries are sorted in the linguistic language order, not on the default binary order of the database encoding scheme.

Basic steps are:

1) Create a Linguistic Index, eg.

CREATE INDEX case_search_ling_name_i ON case_search(NLSSORT(name,’NLS_SORT=GENERIC_M_CI’));

2) Set NLS_SORT in the session (or set parameter) to use the required Linguistic sort option , eg.

ALTER SESSION SET NLS_SORT=’GENERIC_M_CI';   

Simply append _CI in the Linguistic sort option to make it Case-Insensitive or _AI to make it Accent-Insensitive.

(Note: if binary ordering is generally adequate, NLS_SORT can simply be set to ‘BINARY_CI’ for Binary Case-Insensitive searches)

3) Set NLS_COMP in the session (or set parameter) to use Linguistic Sorts/Case Insensitive Searches, eg.

ALTER SESSION SET NLS_COMP=’LINGUISTIC';

A search now based on WHERE name=’ZIGGY’ will automatically perform a case-insensitive search without the need to modify the application to use specific functions.

For a full demo, see Use Linguistic Indexes Demo.

However, before you rush out and start using Linguistic Indexes to possibly simplify the use of case insensitive searches, note there are various disadvantages to Linguistic Indexes, which can somewhat dampen their appeal. These will be covered in Part II of this series.

Follow

Get every new post delivered to your Inbox.

Join 1,894 other followers