December 09, 2008

Caseless Comparison

Here's a question that doesn't always come up during the application design stage, but probably should: "What case do we store the data in?"

Knowing whether the data is stored in upper, lower or mixed case impacts the design of your data entry and data-retrieval panels. With DB2 for z/OS, data is stored in the same case in which it's entered. It's up to the application to store the data as-is or to convert it to upper or lower case.

The reason to consider converting the data--to, for instance, all upper case- is to support indexing, which in turn improves performance during search retrieval. Some applications are written to store the data the way it's entered: as-is. With these types of applications, if I enter my name as "Troy," that's exactly what's stored in the database. If I search on "troy," nothing would be found.

As this example shows, converting data to one case is the best solution. Given that upper is the default case on z/OS, using the built-in DB2 UPPER function makes sense. Here's an example of using UPPER on an insert statement:

   VALUES (UPPER(:ws-firstnme), UPPER(:ws-lastname)) ;

Now in your application you'd search for employees using a given last name that's stored in a host variable called ws-lastname. I assume the search string is already in upper case, but if not, you can use UPPER as I have here:

SELECT firstnme, lastname
FROM employee
WHERE lastname =  UPPER(:ws-lastname)

The problem here, however, is that you must write an edit program to format the data in mixed case for screen display and reports. It would be much better to store the data in mixed case so you avoid having to edit the data for screen display and reports.

Here's an example of how to retrieve the first and last name for employees when the data is stored in mixed case:

SELECT firstnme, lastname
FROM employee
WHERE UPPER(lastname) = UPPER(:ws-lastname)

The problem with using a function on a column like UPPER(lastname) is that this predicate isn't indexable. DB2 cannot use an index on a column that's being referenced in an expression that includes the use of functions. At least that was the case prior to DB2 Version 9. But it's no longer a problem provided you have an index built using the same expression.

Here's an example of how to build an index with expressions. In this case I use the expression UPPER(lastname).

   (UPPER(LASTNAME,’ ‘)) ;

Thanks to DB2 V9, I now have a well-performing caseless comparison of the data stored in DB2 with the host variables used in my application.

To learn more about index with expression, see what I wrote in a previous DB2utor blog entry.