Tuesday, 16 August 2011

CoherentWeb 4.0 Release - Writing an XSLT Editor - Part I

CoherentWeb 4.0 has just been released. This marks the culmination in an effort lasting well over 6 months, to build an XSLT editor from scratch and blend it in seamlessly with CoherentWeb's XSLT-processing and XPath analysis tools.

I'm pleased now that I didn't go for the easy solution of incorporating a 3rd party editor into the tool, tempting though it was. Building an XML editor from scratch allowed me to rethink some of the very fundamentals of editor design.

The Initial State

CoherentWeb already had an XML text rendering system for viewing/analysis, it relied on .NET's XmlReader for parsing and a fallback parser for non well-formed XML (a modified version of MindTouch's SgmlReader). This approach though, was clearly unsuited for an editor, where each and every character found must be faithfully reproduced, no matter what (almost, more on that later). A fresh think was required:

Considering the XML source

There are 2 main parts to rendering XML text for viewing, syntax coloring and indentation. Files opened from other sources may already have padding characters for indentation, but they may not, so indentation was definitely needed on loading. The optimal solution would be of course to use the same code when loading a file as that used to reformat text when it was being edited. For optimal performance, this would require indentation and syntax-coloring to be performed in the same pass.

Using RTF codes for indentation

The base control for rendering text was .NET's RichTextBox (InkEdit is actually used as it's a later version and has better performance), this is effectively the same control used by WordPad in Windows, so has excellent RTF rendering capability. My natural though process went along the lines that, if you're generating RTF to colour XML text you can also use RTF to indent it, it didn't even occur to me that inserting padding characters was a viable alternative, adding characters to text in the middle of the parsing process would complicated things further (though line-feed characters for XML formatting must be added if not present in the source, this is less frequent)

Trimming the source

There was just one issue though, a big one, what if there were padding characters in the XML source? This would interfere with any RTF-based indentation, so there must be a capability to remove it, so we're not adding characters whilst parsing, but we are removing them, how can this be kept simple without having offset values all over the place? The compromise was for the parser to run a second time on the modified text, but with all RTF-related calls ignored. On the second pass, the only updates required were for the node positions.

The XML Model - A Frozen String

To be really effective, an XML Editor needs to know exactly where the text/mouse cursor is at all times, in the context of the XML. A model of the XML is therefore required that maps directly onto character positions. In this design, the internal model only stores XML node-context and associated character positions as integers, no text is stored, either for names or values. This is similar to the concept of the Frozen-Stream pattern, but we have a string instead of a stream. As well as tracking text position, the model also needs information on the node hierarchy to permit XML-aware operations, and to synchronise with a lazy-loading XML outline, in the form of a TreeView. This model needs to be efficient because it is updated on every keystroke.

Summary

A lot happens in an XML text editor, even before the XML text is rendered. But what happens here is critical, because it defines the functionality available and responsiveness when the XML text is actually edited. I've skipped over the way nodes are parsed because this is pretty standard stuff, the key is that rolling your own parser means that no character information is lost. It also made it possible to do non-standard  processing, such as parsing XML comments independently, in case they contained XML (so XML is indented properly in comments, also without padding).



0 comments:

Post a Comment

On Twitter...

    follow me on Twitter