Introduction To Linguistic Indexes – Part I January 3, 2008Posted by Richard Foote in Index Access Path, Indexing Tricks, Linguistic Indexes, Oracle Indexes.
Characters are sorted by default based on numeric values defined by the default character encoding scheme (known as Binary Sorting). For us Australians, this is fine as we (generally) speak English and the English alphabet is nicely sorted in ascending order by ASCII and EBCDIC standards. However, many other languages are not so fortunate as the binary sort does not sort the data in many language’s alphabetic sort order. Oracle has many Globalization Support features to help users in other languages get over these issues (all very interesting and topics for many a Blog entry in the future).
However, even us Australians have issues when it comes to “case-insensitive” searches, where data may be stored in many different cases (eg. Ziggy, ZIGGY, ZiGgY, etc.) and we want to return all data that matches a character value, regardless of its case.
The issue of course is that by default, all text searches are case-sensitive. For example a search WHERE name=’ZIGGY’ will only return ‘ZIGGY’ but not ‘Ziggy’ or ‘ZiGgY’ etc.
The standard fix is for the application to convert the data to a consistent case when performing the search. For example a search WHERE UPPER(Name) = ‘ZIGGY’ will return all values of “ZIGGY” regardless of their case but this will negate the use of any standard index on the Name column.
Therefore, a Function-Based index is required, say based on UPPER(Name), to ensure an efficient index access is possible for case insensitive searches.
However, this often requires an additional index to be created and for the application to be explicitly written to make use of the function-based index defined function.
Now the best cure for this problem is simply to ensure all data is stored in a consistent case (ZIGGY, ZIGGY, ZIGGY) but this may not always be practical or even desirable in some cases.
Another possible solution is the use of a Linguistic Index. This is an index that is created based on a specific case insensitive linguistic language or multilingual option that ensures the index entries are sorted in the linguistic language order, not on the default binary order of the database encoding scheme.
Basic steps are:
1) Create a Linguistic Index, eg.
CREATE INDEX case_search_ling_name_i ON case_search(NLSSORT(name,’NLS_SORT=GENERIC_M_CI’));
2) Set NLS_SORT in the session (or set parameter) to use the required Linguistic sort option , eg.
ALTER SESSION SET NLS_SORT=’GENERIC_M_CI’;
Simply append _CI in the Linguistic sort option to make it Case-Insensitive or _AI to make it Accent-Insensitive.
(Note: if binary ordering is generally adequate, NLS_SORT can simply be set to ‘BINARY_CI’ for Binary Case-Insensitive searches)
3) Set NLS_COMP in the session (or set parameter) to use Linguistic Sorts/Case Insensitive Searches, eg.
ALTER SESSION SET NLS_COMP=’LINGUISTIC’;
A search now based on WHERE name=’ZIGGY’ will automatically perform a case-insensitive search without the need to modify the application to use specific functions.
For a full demo, see Use Linguistic Indexes Demo.
However, before you rush out and start using Linguistic Indexes to possibly simplify the use of case insensitive searches, note there are various disadvantages to Linguistic Indexes, which can somewhat dampen their appeal. These will be covered in Part II of this series.