Monday, 27 September 2010

Using XSLT/EXPath with Word 2010

The EXPath Zip module shows great potential for publishers wishing to create documents that are XML based but rely on Zip-Compressed packages to bring related data together into a single file.

A number of well established standards such as ODF, XPS and ePub use Zip for this purpose, but in this example I chose the OOXML standard, as used by Microsoft Word 2010.


The presentation (above) describes the general approach used, a screencast (below) shows the XSLT in action. Developing the XSLT itself was relatively straightforwards because of the declarative approach EXPath ZIP uses for describing the ZIP file structure and contents. There was just one minor issue, the sample OData file I used had bmp files embedded using Base64 encoding, however the first 78 bytes of these were just header info - I therefore introduced a (non-standard) 'offset' attribute to the zip:entry element, to allow these first 78 bytes to be ignored.



Links for the files used are below. If you wish to use the XSLT, you will need to change the $word-in-href and $output variables that correspond to the locations of the resource and output files, to suit your own system:

Word Template $word-in-href
XSLT File
OData Input File

Non-Standard Extension to EXPath


As explained in the video presentatation, a non-standard 'offset' attribute for the zip:entry element was used to describe the Base64 output. This was because, after conversion from Base64, the first 78 bytes of the binary needed to be discarded as they contained header data that would otherwise have made the bmp file invalid.
Due to this non-standard attribute, this XSLT can only be run unaltered on the CoherentWeb XSLT test tool. If, however, this is found to be a common use case, it may be possible however for this to be added to the EXPath ZIP specification, which is still awaiting formal documentation.

0 comments:

Post a Comment

On Twitter...

    follow me on Twitter