March 03, 2009

Using XML Schema Validation

In last week's blog entry, I discussed the XML schema registration process, but I didn't explain what the schema registration is used for. DBAs like it when the DBMS enforces the existence or non-existence of a column (not NULL versus NULL). It's also nice to know that data values are valid for the column's data type (e.g., DECIMAL, DATE, TIME, TIMESTAMP). DATE and TIME data types didn't exist in early DB2 releases, so applications were developed that inserted invalid date data values. It wasn't until DB2 1.3 that the DBMS provided a native data type for DATE and enforced a valid date value. With this provision, the DBMS also could now provide date arithmetic operations to add and subtract dates. The drawback was that people had to go back and cleanup the date data before they could convert the column to a date data type.

Developers love how XML's flexibility gives them the freedom to define the structure and content of XML documents as they choose. They don't have to wait for a DBA to insert or delete a column from their table. However, to avoid problems associated with not enforcing data types, the developer must ensure that the application is using the correct structure and data types defined for the XML document. While DB2 doesn't provide a DBMS-enforced method for validating the XML data, it does provide a user-defined function (SYSFUN.DSN_XMLVALIDATE) that validates the XML document based on a named XML schema. As I noted last week, an XML schema includes a description of the XML document, along with constraints on the document's structure and contents. Before you can invoke the SYSFUN.DSN_XMLVALIDATE function, all schema documents that make up an XML schema must be registered with the built-in XML schema repository (XSR).

Invoking XML schema validation creates a significant performance overhead. You use two to three times more CPU than you would if you didn't validate the document. For that reason, XML schema validation should be used during development and testing to verify that the application is working correctly--but turn off schema validation in production. One way to do this is to define triggers in the test environment that invoke DSN_VALIDATE. Make sure you don't create these triggers in a production environment.

If you're currently working with the DSN_VALIDATE function, please share your experiences in comments.