September 23, 2008

Unloading Tables with XML Columns

As its name implies, the UNLOAD utility is used to unload data from one or multiple source objects (either DB2 table spaces or DB2 image copy data sets). UNLOAD is a robust utility with many features, including the capability to unload entire rows from a table space or specific partitions within a table space. You can specify which columns to extrat, as well as filter based on values in one or more columns.

Let's review specific parameters for unloading XML columns into a separate file from the relational data within a table. To learn more about other UNLOAD features, see "DB2 Version 9.1 for z/OS Utility Guide and Reference" (published February 2008).

When unloading data from a table that contains one or more XML columns, you have the choice of where the XML data is stored. When the entire output record length is less than 32K, the XML data can be stored with the relational data in one record in a single output file. When the output record is greater than 32K, the XML data must be stored in a separate file. This file is usually a PDS file with a member for each XML document unloaded.

The output record location is determined by the datatype parameter in the UNLOAD control card. To store the XML data with the relational data in one output file, you must use the data type of XML. However, when the output record is greater than 32K, or you just want to store the XML data in a separate file, you must specify a data type of (CHAR or VARCHAR) along with a file indicator parameter of (BLOBF, CLOBF or DBCLOBF). This parameter indicates that the column value stored in the first output file used for the relational data contains a file name pointing to the file containing the XML data. The data stored in this file is in either character, binary or double-byte format, based on the data type (CLOBF, BLOBF, or DBCLOBF).

This is a little confusing, but look at the SYSIN DD input paramaters, and you'll notice the FROM TABLE statement with the XML column INFO. This column is defined as VARCHAR--not XML--and has CLOBF followed by CLOBREC. CLOBREC is the name of the template used to create the file that the INFO XML documents are stored in. In this case, the unloaded XML column is stored as a member in file CSBP.UNLD.CSBPPG13.CLOBREC.

In Figure 1(below, click on figure to see full-size version in a pop-up window), sample JCL is used to unload a table (CSBP.MYCUSTOMER) defined with two columns. The first column, CID, contains BIGINT data; the second column, INFO, is XML data.



//                     SYSTEM='DB9G',UTPROC=''                                    
//SYSPRINT DD SYSOUT=*                                                
//UTPRINT   DD SYSOUT=*                                                
//SYSIN DD *                                                         
  TEMPLATE UNLD   DSN &USERID..UNLD.&TS..SYSREC                      
          UNIT SYSDA  DISP (NEW, CATLG, CATLG) SPACE (30,10) CYL      
          UNIT SYSDA  DISP (NEW, CATLG, CATLG) SPACE (30,10) CYL      
  TEMPLATE LDMAP    DSN &USERID..UNLD.&TS..PUNCH                     
          UNIT SYSDA  DISP (NEW, CATLG, CATLG) SPACE (1,1) TRK       
  UNLOAD TABLESPACE CSBPDB.CSBPPG13                                  
    PUNCHDDN LDMAP                                                   
    UNLDDN UNLD   EBCDIC                                             
    FROM TABLE CSBP.MYCUSTOMER                                       
    (CID       BIGINT,                                                
     INFO      VARCHAR CLOBF CLOBREC                           

The table (CSBP.MYCUSTOMER) only contains one row for this exercise. The relational data is unloaded into the UNLDDDN using the template UNLD. Within this file you see the file name that contains the XML data. Based on the template, the unload relational data is stored in the UNLD template, giving you a file name of CSBP.UNLD.CSBPPG13.SYSREC. 

The contents of the UNLD file in HEX (see Figure 2, below; click on image to see a full-size version in a pop-up window):



Looking at the unloaded data in the SYSREC file above, notice the CID value is 1001 or HEX 3E9, and XML column INFO is unloaded and stored in file CSBP.UNLD.XMYC0000.CLOBREC(B1GVSXXO)

Based on the parameter used for the XML column, the XML data is stored in the file based on a template called CLOBREC. The first row extracted generates a member name (B1GVSXXO) and is stored in a PDS file called CSBP.UNLD.XMYC0000.CLOBREC, which is defined by template CLOBREC within the UNLOAD control statements. Notice the XML document is stored as one row in the PDS member B1GVSXXO (see Figure 3, below; click on image to view a full-size version in a pop-up window):


LDMAP template creates file CSBP.UNLD.CSBPPG13.PUNCH, which contains the LOAD utility parameters to reload the data that was just unloaded (see Figure 4, below; click on image to view a full-size version in a pop-up window):


TEMPLATE B1GVSYKX                                                   
LOAD DATA INDDN B1GVSYKX LOG NO  RESUME YES                         
EBCDIC  CCSID(01047,65534,65534)                                    
SORTKEYS                   1                                       
INTO TABLE "CSBP"."MYCUSTOMER"                                     
WHEN(00001:00002) = X'0028'                                        
( "DSN_NULL_IND_00001" POSITION(  00003)       CHAR(1)             
, "CID"                                                            
  POSITION(  00004:00011) BIGINT                                    
, "DSN_NULL_IND_00002" POSITION(  00012)       CHAR(1)             
, "INFO"                                                            

I found specifying the file location as part of the column definition of the table layout to be quite confusing. It would have been nice just to define a parameter name--just like PUNCHDDN for the load control card and UNLDDN for the unloaded data. Maybe XMLDDN would have been a nice option to hold the XML data. However, that isn't the case. So remember: The location of the XML data is stored in the file allocated to UNLDDN.

If you have questions about or some experiences with this new feature, please post them here.